hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

History

mwtian 1483c4553c use smaller instance for scheduling tests (#25635 ) m5.16xlarge instances have 64 CPU and 256GB memory, which are overkill for scheduling tests that do not have a lot of computations. Use smaller instance m5.4xlarge to save cost and make allocating instances easier.		2022-06-10 17:09:35 +00:00
..
distributed	lower the utilization threshold in many tasks scheduling test by 5% (#24758 )	2022-05-13 10:44:58 -07:00
object_store	[Release Test] Add perf metrics for core scalability tests (#23110 )	2022-03-14 10:20:39 +09:00
single_node	[Release Test] Add perf metrics for core scalability tests (#23110 )	2022-03-14 10:20:39 +09:00
app_config.yaml	[ci/release] Support running tests with different python versions (#24843 )	2022-05-17 17:03:12 +01:00
distributed.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
distributed_smoke_test.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
many_nodes.yaml	[Nightly tests] Improve k8s testing (#23108 )	2022-03-14 03:49:15 -07:00
object_store.yaml	[Nightly tests] Improve k8s testing (#23108 )	2022-03-14 03:49:15 -07:00
README.md	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
scheduling.yaml	use smaller instance for scheduling tests (#25635 )	2022-06-10 17:09:35 +00:00
single_node.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00

Ray Scalability Envelope

Distributed Benchmarks

All distributed tests are run on 64 nodes with 64 cores/node. Maximum number of nodes is achieved by adding 4 core nodes.

Dimension	Quantity
# nodes in cluster (with trivial task workload)	250+
# actors in cluster (with trivial workload)	10k+
# simultaneously running tasks	10k+
# simultaneously running placement groups	1k+

Dimension	Quantity
1 GiB object broadcast (# of nodes)	50+

All single node benchmarks are run on a single m4.16xlarge.

Dimension	Quantity
# of object arguments to a single task	10000+
# of objects returned from a single task	3000+
# of plasma objects in a single `ray.get` call	10000+
# of tasks queued on a single node	1,000,000+
Maximum `ray.get` numpy object size	100GiB+