hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Richard Liaw	96e8027c7e	[air] large tune/torch benchmark (#26763 ) Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-23 01:17:25 -07:00
Balaji Veeramani	ac1d21027d	[AIR] Add framework-specific checkpoints (#26777 )	2022-07-20 19:33:27 -07:00
Kai Fricke	2e35d47bd2	[air/train/benchmark] Add TF GPU 4x4 benchmark (#26776 )	2022-07-20 14:07:51 -07:00
matthewdeng	2a425b195c	[air] change default strategy to PACK (#26757 )	2022-07-19 23:01:24 -07:00
xwjiang2010	75027eb479	[air/benchmarks] train/tune benchmark (#26564 ) Making sure that tuning multiple trials in parallel is not significantly slower than training each individual trials. Some overhead is expected. Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-07-19 18:24:39 +01:00
Richard Liaw	7e62e1187c	[air/benchmark] Torch benchmarks for 4x4 (#26692 ) Add benchmark data for 4x4 GPU setup. Signed-off-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-07-19 17:06:37 +01:00
Sumanth Ratna	759966781f	[air] Allow users to use instances of `ScalingConfig` (#25712 ) Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-07-18 15:46:58 -07:00
Kai Fricke	00947fd949	[air/benchmarks] Add 4x1 GPU benchmark for Torch (#26562 )	2022-07-18 12:14:10 -07:00
matthewdeng	6670708010	[air] add placement group max CPU to data benchmark (#26649 ) Set experimental `_max_cpu_fraction_per_node` to prevent deadlock. This should technically be a no-op with the SPREAD strategy.	2022-07-18 10:34:40 -07:00
Jiao	98a07920d3	[AIR][CUJ] Make distributing training benchmark at silver tier (#26640 )	2022-07-17 22:07:09 -07:00
Jiao	77e2ef2eb6	[AIR] Update Torch benchmarks with documentation (#26631 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-16 17:58:21 -07:00
Eric Liang	0855bcb77e	[air] Use SPREAD strategy by default and don't special case it in benchmarks (#26633 )	2022-07-16 17:37:06 -07:00
Jiao	196e52ad7c	[AIR][CUJ] E2E Pytorch training (#26621 )	2022-07-16 08:23:19 -07:00
Jiao	988ffd494b	[AIR][CUJ] Add GPU bench prediction benchmark (#26614 )	2022-07-16 08:22:37 -07:00
matthewdeng	e3a096f412	[air] add bulk ingest benchmarks (#26618 )	2022-07-15 22:01:23 -07:00
Richard Liaw	5ad4e75831	[air] Add initial benchmark section (#26608 )	2022-07-15 15:33:48 -07:00
xwjiang2010	a241e6a0f5	[air] Add xgboost release test for silver tier(10-node case). (#26460 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-07-15 13:21:10 -07:00
Kai Fricke	213a96e239	[air/benchmarks] Add distributed Tensorflow benchmarks (CPU only) (#26519 ) Following up from #26436, this PR adds a distributed benchmark test for Tensorflow FashionMNIST training. It compares training with Ray AIR with training with vanilla PyTorch. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-07-14 22:08:43 +01:00
Kai Fricke	cf75cf7232	[air] Add AIR distributed training benchmark for Torch FashionMNIST (#26436 ) This PR adds a distributed benchmark test for Pytorch MNIST training. It compares training with Ray AIR with training with vanilla PyTorch. In both cases, the same training loop is used. For Ray AIR, we use a TorchTrainer with 4 CPU workers. For vanilla PyTorch, we upload a training script and kick it off (using Ray tasks) in subprocesses on each node. In both cases, we collect the end to end runtime. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-07-13 10:53:24 +01:00

19 commits