hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

History

mwtian 513881584d [Core] install jemalloc in Ray docker and use jemalloc in `benchmark` release tests (#26112 ) There are mysterious memory usage growth in Ray clusters that disappear when running with jemalloc. Before we are able to figure out the root cause, it seems using jemalloc by default can be a good walkaround. Because of its efficiency, using jemalloc by default can be beneficial, but we need to run more benchmarks to verify.		2022-06-27 23:26:56 -07:00
..
distributed	lower the utilization threshold in many tasks scheduling test by 5% (#24758 )	2022-05-13 10:44:58 -07:00
object_store	[Release Test] Add perf metrics for core scalability tests (#23110 )	2022-03-14 10:20:39 +09:00
single_node	[Release Test] Add perf metrics for core scalability tests (#23110 )	2022-03-14 10:20:39 +09:00
app_config.yaml	[Core] install jemalloc in Ray docker and use jemalloc in `benchmark` release tests (#26112 )	2022-06-27 23:26:56 -07:00
distributed.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
distributed_smoke_test.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
many_nodes.yaml	[Nightly tests] Improve k8s testing (#23108 )	2022-03-14 03:49:15 -07:00
object_store.yaml	[Nightly tests] Improve k8s testing (#23108 )	2022-03-14 03:49:15 -07:00
README.md	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00
scheduling.yaml	use smaller instance for scheduling tests (#25635 )	2022-06-10 17:09:35 +00:00
single_node.yaml	Migrate scalability tests (#22901 )	2022-03-08 17:22:41 -08:00

README.md

Ray Scalability Envelope

Distributed Benchmarks

All distributed tests are run on 64 nodes with 64 cores/node. Maximum number of nodes is achieved by adding 4 core nodes.

Dimension	Quantity
# nodes in cluster (with trivial task workload)	250+
# actors in cluster (with trivial workload)	10k+
# simultaneously running tasks	10k+
# simultaneously running placement groups	1k+

Object Store Benchmarks

Dimension	Quantity
1 GiB object broadcast (# of nodes)	50+

Single Node Benchmarks.

All single node benchmarks are run on a single m4.16xlarge.

Dimension	Quantity
# of object arguments to a single task	10000+
# of objects returned from a single task	3000+
# of plasma objects in a single `ray.get` call	10000+
# of tasks queued on a single node	1,000,000+
Maximum `ray.get` numpy object size	100GiB+