Commit graph

5 commits

Author SHA1 Message Date
Richard Liaw
232be5a058
[sgd] fault tolerance for pytorch + revamp documentation (#6465) 2020-01-16 18:38:27 -08:00
daiyaanarfeen
8f6d73a93a [sgd] Extend distributed pytorch functionality (#5675)
* raysgd

* apply fn

* double quotes

* removed duplicate TimerStat

* removed duplicate find_free_port

* imports in pytorch_trainer

* init doc

* ray.experimental

* remove resize example

* resnet example

* cifar

* Fix up after kwargs

* data_dir and dataloader_workers args

* formatting

* loss

* init

* update code

* lint

* smoketest

* better_configs

* fix

* fix

* fix

* train_loader

* fixdocs

* ok

* ok

* fix

* fix_update

* fix

* fix

* done

* fix

* fix

* fix

* small

* lint

* fix

* fix

* fix_test

* fix

* validate

* fix

* fi
2019-11-05 11:16:46 -08:00
Richard Liaw
fb40787603
[docs] Distributed Training Quickfix (#5571) 2019-08-29 15:38:43 -07:00
Richard Liaw
411f30c125
[docs] Second push of changes (#5391) 2019-08-28 17:54:15 -07:00
Peter Schafhalter
c2ade075a3 [sgd] Distributed Training via PyTorch (#4797)
Implements distributed SGD using distributed PyTorch.
2019-06-01 21:39:22 -07:00