mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
![]() Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally. In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes. Buffer Size | Runtime (s) -- | -- Default | 3221.865916 256KB | 1758.885839 1MB | 748.226089 10MB | 526.406466 100MB | 494.830513 Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size. For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes. Prefixes | Runtime (s) -- | -- 1 | 748.226089 3 | 527.658646 10 | 516.010742 Together these changes enable faster large-scale object spilling. Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal> |
||
---|---|---|
.. | ||
_includes | ||
_static | ||
_templates | ||
cluster | ||
data | ||
images | ||
ray-contribute | ||
ray-core | ||
ray-design-patterns | ||
ray-more-libs | ||
ray-observability | ||
ray-overview | ||
ray-references | ||
raysgd | ||
rllib | ||
serve | ||
train | ||
tune | ||
workflows | ||
_toc.yml | ||
conf.py | ||
custom_directives.py | ||
index.md |