ray/release/benchmarks
Jian Xiao 096918b357 lower the utilization threshold in many tasks scheduling test by 5% (#24758)
Fix the failure to unbreak nightly and unblock 1.13 release.

The root cause is the upgrade of GRPC to 1.45.2 made it slightly slow; this is an acceptable regression which is needed to make this upgrade.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
2022-05-13 20:27:02 +00:00
..
distributed lower the utilization threshold in many tasks scheduling test by 5% (#24758) 2022-05-13 20:27:02 +00:00
object_store [Release Test] Add perf metrics for core scalability tests (#23110) 2022-03-14 10:20:39 +09:00
single_node [Release Test] Add perf metrics for core scalability tests (#23110) 2022-03-14 10:20:39 +09:00
app_config.yaml Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00
distributed.yaml Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00
distributed_smoke_test.yaml Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00
many_nodes.yaml [Nightly tests] Improve k8s testing (#23108) 2022-03-14 03:49:15 -07:00
object_store.yaml [Nightly tests] Improve k8s testing (#23108) 2022-03-14 03:49:15 -07:00
README.md Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00
scheduling.yaml Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00
single_node.yaml Migrate scalability tests (#22901) 2022-03-08 17:22:41 -08:00

Ray Scalability Envelope

Distributed Benchmarks

All distributed tests are run on 64 nodes with 64 cores/node. Maximum number of nodes is achieved by adding 4 core nodes.

Dimension Quantity
# nodes in cluster (with trivial task workload) 250+
# actors in cluster (with trivial workload) 10k+
# simultaneously running tasks 10k+
# simultaneously running placement groups 1k+

Object Store Benchmarks

Dimension Quantity
1 GiB object broadcast (# of nodes) 50+

Single Node Benchmarks.

All single node benchmarks are run on a single m4.16xlarge.

Dimension Quantity
# of object arguments to a single task 10000+
# of objects returned from a single task 3000+
# of plasma objects in a single ray.get call 10000+
# of tasks queued on a single node 1,000,000+
Maximum ray.get numpy object size 100GiB+