2020-02-18 13:43:19 -08:00
RaySGD: Distributed Training Wrappers
=====================================
2020-01-16 18:38:27 -08:00
2020-02-18 13:43:19 -08:00
.. _`issue on GitHub`: https://github.com/ray-project/ray/issues
2020-01-16 18:38:27 -08:00
2020-01-29 08:51:01 -08:00
RaySGD is a lightweight library for distributed deep learning, providing thin wrappers around PyTorch and TensorFlow native modules for data parallel training.
2020-01-17 11:52:04 -08:00
2020-01-16 18:38:27 -08:00
The main features are:
2020-03-03 16:44:42 -08:00
- **Ease of use** : Scale PyTorch's native `` DistributedDataParallel `` and TensorFlow's `` tf.distribute.MirroredStrategy `` without needing to monitor individual nodes.
2020-01-29 08:51:01 -08:00
- **Composability** : RaySGD is built on top of the Ray Actor API, enabling seamless integration with existing Ray applications such as RLlib, Tune, and Ray.Serve.
- **Scale up and down** : Start on single CPU. Scale up to multi-node, multi-CPU, or multi-GPU clusters by changing 2 lines of code.
2020-05-13 22:52:38 -07:00
.. tip :: Join our `community slack <https://forms.gle/9TSdDYUgxYs8SA9e8> `_ to discuss Ray!
2020-04-02 11:14:02 -07:00
2020-01-29 08:51:01 -08:00
Getting Started
---------------
2020-03-03 16:44:42 -08:00
You can start a `` TorchTrainer `` with the following:
2020-01-29 08:51:01 -08:00
.. code-block :: python
2020-03-11 13:40:18 -07:00
import ray
2020-03-03 16:44:42 -08:00
from ray.util.sgd import TorchTrainer
2020-03-11 13:40:18 -07:00
from ray.util.sgd.torch.examples.train_example import LinearDataset
import torch
from torch.utils.data import DataLoader
2020-01-29 08:51:01 -08:00
def model_creator(config):
2020-03-11 13:40:18 -07:00
return torch.nn.Linear(1, 1)
2020-01-29 08:51:01 -08:00
def optimizer_creator(model, config):
"""Returns optimizer."""
return torch.optim.SGD(model.parameters(), lr=1e-2)
2020-01-16 18:38:27 -08:00
2020-03-11 13:40:18 -07:00
def data_creator(config):
train_loader = DataLoader(LinearDataset(2, 5), config["batch_size"])
val_loader = DataLoader(LinearDataset(2, 5), config["batch_size"])
return train_loader, val_loader
2020-01-16 18:38:27 -08:00
2020-01-29 08:51:01 -08:00
ray.init()
2020-01-16 18:38:27 -08:00
2020-03-03 16:44:42 -08:00
trainer1 = TorchTrainer(
2020-03-11 13:40:18 -07:00
model_creator=model_creator,
data_creator=data_creator,
optimizer_creator=optimizer_creator,
loss_creator=torch.nn.MSELoss,
num_workers=2,
use_gpu=False,
config={"batch_size": 64})
2020-01-16 18:38:27 -08:00
2020-01-29 08:51:01 -08:00
stats = trainer1.train()
print(stats)
trainer1.shutdown()
print("success!")
2020-02-18 13:43:19 -08:00
.. tip :: Get in touch with us if you're using or considering using `RaySGD <https://forms.gle/26EMwdahdgm7Lscy9> `_ !