hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

Fork 0

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

Commit graph

Author	SHA1	Message	Date
Eric Liang	163620ba94	[data] Make block splitting feature flagged off by default (#20660 ) block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default: - We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting). - We're confident lineage-based reconstruction can work with block splitting.	2021-11-23 19:46:18 -08:00
Eric Liang	65a8698e82	Raise the dataset block size limit to 2GiB (#20551 ) The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.	2021-11-18 19:36:10 -08:00
Eric Liang	460cf86858	Split blocks automatically into 500MB chunks on file read and transformation (#20235 ) This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.	2021-11-15 22:25:11 -08:00

Author

SHA1

Message

Date

Eric Liang

163620ba94

[data] Make block splitting feature flagged off by default (#20660 )

block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default:
- We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting).
- We're confident lineage-based reconstruction can work with block splitting.

2021-11-23 19:46:18 -08:00

Eric Liang

65a8698e82

Raise the dataset block size limit to 2GiB (#20551 )

The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.

2021-11-18 19:36:10 -08:00

Eric Liang

460cf86858

Split blocks automatically into 500MB chunks on file read and transformation (#20235 )

This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.

2021-11-15 22:25:11 -08:00

3 commits