Making sure that tuning multiple trials in parallel is not significantly slower than training each individual trials.
Some overhead is expected.
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Add benchmark data for 4x4 GPU setup.
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Jimmy Yao <jiahaoyao.math@gmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
Following up from #26436, this PR adds a distributed benchmark test for Tensorflow FashionMNIST training. It compares training with Ray AIR with training with vanilla PyTorch.
Signed-off-by: Kai Fricke <kai@anyscale.com>
This PR adds a distributed benchmark test for Pytorch MNIST training. It compares training with Ray AIR with training with vanilla PyTorch.
In both cases, the same training loop is used. For Ray AIR, we use a TorchTrainer with 4 CPU workers. For vanilla PyTorch, we upload a training script and kick it off (using Ray tasks) in subprocesses on each node. In both cases, we collect the end to end runtime.
Signed-off-by: Kai Fricke <kai@anyscale.com>