These changes add Dataset Read API support for (1) specifying custom block metadata provider callbacks, and (2) skipping path expansion. When paired with a custom block metadata provider that maintains an in-memory cache of BlockMetadata for each input file path, these changes reduced average S3-based dataset read times for production [Redshift Manifests](https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html) stored in Amazon's internal data catalog by over 90%. A simple ParquetDatasource benchmark reading 144MM records across 100 ~70MiB (on-disk) Parquet files stored in S3 showed an ~75% reduction in read latency (from 4.62 seconds to 1.18 seconds on 2 r5n.8xlarge EC2 nodes).
E.g. long running tests run on small clusters (often 8 CPUs) but block other jobs for a long time. We should thus add more granularity to the concurrency groups.
Additionally, limits have been slightly adjusted to make more sense (e.g. 8 GPUs are now small-gpu, 9+ GPUs large-gpu, instead of 7 for small-gpu and 8 for large-gpu).
The `schema_to_deployment()` function preserve unset fields with unexpected default argument types. This change excludes unset fields in that function and also changes the dictionaries' default values to empty dicts.
Buffering writes to AWS S3 is highly recommended to maximize throughput. Reducing the number of remote I/O requests can make spilling to remote storages as effective as spilling locally.
In a test where 512GB of objects were created and spilled, varying just the buffer size while spilling to a S3 bucket resulted in the following runtimes.
Buffer Size | Runtime (s)
-- | --
Default | 3221.865916
256KB | 1758.885839
1MB | 748.226089
10MB | 526.406466
100MB | 494.830513
Based on these results, a default buffer size of 1MB has been added. This is the minimum buffer size used by AWS Kinesis Firehose, a streaming service for S3. On systems with larger availability, it is good to configure a larger buffer size.
For processes that reach the throughput limits provided by S3, we can remove that bottleneck by supporting more prefixes/buckets. These impacts are less noticeable as the performance gains from using a large buffer prevent us from reaching a bottleneck. The following runtimes were achieved by spilling 512GB with a 1MB buffer and varying prefixes.
Prefixes | Runtime (s)
-- | --
1 | 748.226089
3 | 527.658646
10 | 516.010742
Together these changes enable faster large-scale object spilling.
Co-authored-by: Ubuntu <ubuntu@ip-172-31-54-240.us-west-2.compute.internal>
Next move of #19220. This pr replace unordered_map to flat_hash_map in most GCS code and some util & common modules.
The placement group part, which exposes user interfaces in Java/Python, is exclusive as it's a little bit complicated.
The follow-up PRs would be migrating in core worker, placement group and others.
This PR is part of resource reporting refactoring. In this PR ray syncer is moved from gcs_resource_manager to gcs_placement_group_scheduler. With this one, gcs_resource_manager is totally decoupled from resource broadcasting.
Mainly the following things:
- This PR deletes the proto cache on RuntimeEnv, ensuring that the user's modification of RuntimeEnv can take effect in the Proto message.
- validate whole runtime env when serialize runtime_env.
- overload method `__setitem__` to parse and validate field when it has to modify.
Separate out the conversion of pandas dataframe to torch tensor in a utility function so that the same logic can be used in other places in Ray ML (for example during inference).
As discussed,
- Removes ConvertibleToTrainable interface and makes as_trainable part of the Trainer interface
- Moves Trainer interface to ray.ml.trainer from ray.ml.train.trainer
This PR supports the job-based file manager and runner. It will be the backbone of k8s migration.
The PR handles edge cases that originally existed in the old e2e.py job-based runners.
The concept of a Serve Application, a data structure containing all information needed to deploy Serve on a Ray cluster, has surfaced during recent design discussions. This change introduces a formal Application data structure and refactors existing code to use it.