ray/release/air_tests/air_benchmarks/compute_cpu_4.yaml at 8bdeb30510ab40b8c5232f0a85cccfa25dca855e - hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Kai Fricke cf75cf7232

[air] Add AIR distributed training benchmark for Torch FashionMNIST (#26436 )

This PR adds a distributed benchmark test for Pytorch MNIST training. It compares training with Ray AIR with training with vanilla PyTorch.

In both cases, the same training loop is used. For Ray AIR, we use a TorchTrainer with 4 CPU workers. For vanilla PyTorch, we upload a training script and kick it off (using Ray tasks) in subprocesses on each node. In both cases, we collect the end to end runtime.

Signed-off-by: Kai Fricke <kai@anyscale.com>

2022-07-13 10:53:24 +01:00

15 lines

280 B

YAML

Raw Blame History

 cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
 region: us-west-2
 max_workers: 3
 head_node_type:
     name: head_node
     instance_type: m5.2xlarge
 worker_node_types:
     - name: worker_node
       instance_type: m5.2xlarge
       max_workers: 3
       min_workers: 3
       use_spot: false