ray/benchmarks
SangBin Cho b350fe9ee8
[Nightly test] Fix additional k8s issues + add new tests (#23231)
Fix bug from the previous fixes.
Add more tests
Stop using m5.xlarge (not supported now)
There are 2 hard blockers from the infra: 1. Large size disk is not supported. 2. m5.xlarge is not supported. Both are considered as a high priority to be fixed soon.
2022-03-16 16:37:29 -07:00
..
distributed [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
object_store [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
single_node [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
app_config.yaml [nightly] Fix benchmark commit check failure (#21119) 2021-12-15 14:54:03 -08:00
distributed.yaml Split scalability envelope + smoke tests (#17455) 2021-07-30 10:20:19 -07:00
distributed_smoke_test.yaml Split scalability envelope + smoke tests (#17455) 2021-07-30 10:20:19 -07:00
many_nodes.yaml Split scalability envelope + smoke tests (#17455) 2021-07-30 10:20:19 -07:00
object_store.yaml [Nightly Test] Readjust nightly test schedule (#20717) 2021-11-26 06:59:16 -08:00
README.md Move scalability envelope back down to 250 nodes (#15381) 2021-04-16 19:39:24 -07:00
scheduling.yaml [nightly] Add more many tasks tests (#21727) 2022-01-20 14:52:26 -08:00
single_node.yaml [Nightly test] Fix additional k8s issues + add new tests (#23231) 2022-03-16 16:37:29 -07:00

Ray Scalability Envelope

Distributed Benchmarks

All distributed tests are run on 64 nodes with 64 cores/node. Maximum number of nodes is achieved by adding 4 core nodes.

Dimension Quantity
# nodes in cluster (with trivial task workload) 250+
# actors in cluster (with trivial workload) 10k+
# simultaneously running tasks 10k+
# simultaneously running placement groups 1k+

Object Store Benchmarks

Dimension Quantity
1 GiB object broadcast (# of nodes) 50+

Single Node Benchmarks.

All single node benchmarks are run on a single m4.16xlarge.

Dimension Quantity
# of object arguments to a single task 10000+
# of objects returned from a single task 3000+
# of plasma objects in a single ray.get call 10000+
# of tasks queued on a single node 1,000,000+
Maximum ray.get numpy object size 100GiB+