Commit graph

3 commits

Author SHA1 Message Date
Eric Liang
163620ba94
[data] Make block splitting feature flagged off by default (#20660)
block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default:
- We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting).
- We're confident lineage-based reconstruction can work with block splitting.
2021-11-23 19:46:18 -08:00
Eric Liang
65a8698e82
Raise the dataset block size limit to 2GiB (#20551)
The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.
2021-11-18 19:36:10 -08:00
Eric Liang
460cf86858
Split blocks automatically into 500MB chunks on file read and transformation (#20235)
This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.
2021-11-15 22:25:11 -08:00