block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default:
- We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting).
- We're confident lineage-based reconstruction can work with block splitting.
The default block size of 500MiB seems too low for some common workloads, e.g. shuffling 500GB. This creates 1000 blocks which means 1 million intermediate shuffle objects until we implement #20500.
This PR adds support for automatic block splitting on read and map transforms, to keep block size bounded to ~500MiB. This avoids potential OOM situations where a map task may consume too much intermediate Python heap memory, or too much object store shared memory for one block.
## Why are these changes needed?
- Since broadcasting is moving to grpc, introducing the option to increase the client side thread number
- For hybrid schedule, ignore the threshold if gcs based actor scheduler is enabled
With these fixing, actor creation rate > 600actor/s vs ~ 140 actor/s
## Related issue number
* start
* check formatting
* undo changes from base branch
* Client builder API docs
* indent
* 8
* minor fixes
* absolute path to runtime env docs
* fix runtime_env link
* Update worker.init docs
* drop clientbuilder docs, link to 1.4.1 docs instead. Specify local:// behavior when address passed
* add debug info for ray.init("local")
* local:// attaches a driver directly
* update ray.init return wording
* remote init.connect() from example
* drop local:// docs, add section on when to use ray client
* link to 1.4.1 docs in code example instead of mentioning clientbuilder
* fix backticks, doc mentions of ray.util.connect
* remove ray.util.connect mentions from examples and comments
* update tune example
* wording
* localhost:<port> also works if you're on the head node
* add quotes
* drop mentions of ray client from ray.init docstring
* local->remote
* fix section ref
* update ray start output
* fix section link
* try to fix doc again
* fix link wording
* drop local:// from docs and special handling from code
* update ray start message
* lint
* doc lint
* remove local:// codepath
* remove 'internal_config'
* Update doc/source/cluster/ray-client.rst
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
* doc suggestion
* Update doc/source/cluster/ray-client.rst
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>