mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

This PR adds a distributed benchmark test for Pytorch MNIST training. It compares training with Ray AIR with training with vanilla PyTorch. In both cases, the same training loop is used. For Ray AIR, we use a TorchTrainer with 4 CPU workers. For vanilla PyTorch, we upload a training script and kick it off (using Ray tasks) in subprocesses on each node. In both cases, we collect the end to end runtime. Signed-off-by: Kai Fricke <kai@anyscale.com>
15 lines
280 B
YAML
15 lines
280 B
YAML
cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
|
|
region: us-west-2
|
|
|
|
max_workers: 0
|
|
|
|
head_node_type:
|
|
name: head_node
|
|
instance_type: m5.2xlarge
|
|
|
|
worker_node_types:
|
|
- name: worker_node
|
|
instance_type: m5.2xlarge
|
|
max_workers: 0
|
|
min_workers: 0
|
|
use_spot: false
|