ray/doc/source/ray-core
Kenneth 07372927cc
Enable buffering and spilling to multiple remote storages (#22798)
Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally.

In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes.

Buffer Size | Runtime (s)
-- | --
Default | 3221.865916
256KB | 1758.885839
1MB | 748.226089
10MB | 526.406466
100MB | 494.830513

Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size.

For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes.

Prefixes | Runtime (s)
-- | --
1 | 748.226089
3 | 527.658646
10 | 516.010742


Together these changes enable faster large-scale object spilling.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>
2022-03-11 11:27:02 -05:00
..
_examples/datasets_train [test] add back deleted datasets train test file (#23051) 2022-03-10 21:46:07 -08:00
doc_code [runtime env][strong-typed API] Combine ParsedRuntimeEnv and RuntimeEnv into ray.runtime.RuntimeEnv (#22522) 2022-02-28 16:18:10 +08:00
examples [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
images [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
actors.rst [doc] Update docs about actor garbage collection (#20763) 2022-02-28 18:45:29 -08:00
advanced.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
async_api.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
concurrency_group_api.rst [doc][Java] Add doc page for java concurrency group. (#21600) 2022-02-16 17:57:03 +08:00
configure.rst [GCS-Ray] update doc and error message for GCS-Ray (#22528) 2022-02-22 17:56:30 -08:00
cross-language.rst [C++ Worker]Python call cpp worker (#22820) 2022-03-10 11:06:14 -08:00
fault-tolerance.rst [core] Enable lineage reconstruction by default (#22816) 2022-03-07 17:40:30 -05:00
handling-dependencies.rst [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
memory-management.rst Enable buffering and spilling to multiple remote storages (#22798) 2022-03-11 11:27:02 -05:00
namespaces.rst [docs] integrate algolia docsearch, move to sphinx panels (#21814) 2022-01-24 17:00:41 -08:00
package-ref.rst [C++ Worker]Python call cpp worker (#22820) 2022-03-10 11:06:14 -08:00
placement-group.rst [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
ray-dashboard.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
serialization.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
starting-ray.rst [Doc] Fix bad doc and recover doc of c++ api (#22213) 2022-02-08 19:04:37 +08:00
tips-for-first-time.rst [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
troubleshooting.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
using-ray-with-gpus.rst [docs] new structure (#21776) 2022-01-21 15:42:05 -08:00
using-ray-with-jupyter.rst [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
using-ray.rst [docs] re/move old core examples (#22802) 2022-03-10 12:17:00 -08:00
walkthrough.rst [docs] external promo content (#22823) 2022-03-10 11:39:44 -08:00