Commit graph

21 commits

Author SHA1 Message Date
Junwen Yao
eb8adc6105
[train] add a utility function to turn off TF autosharding (#21887)
This PR adds a utility function to turn off TF autosharding as a temporary solution.

Closes #19324.
2022-01-28 16:09:06 -08:00
Max Pumperla
4dd221f848
[Docs] Ray Data docs target state (#21931)
Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)

The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have

- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.

This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.
2022-01-27 13:14:36 -08:00
Max Pumperla
7953c9ca57
[docs] integrate algolia docsearch, move to sphinx panels (#21814) 2022-01-24 17:00:41 -08:00
matthewdeng
8119b62640
[train] refactor callback logdir and results preprocessors (#21468)
* [train] Add TorchTensorboardProfilerCallback and introduce ResultsPreprocessors

* simplify profiler

* read on get_and_clear_profile_traces

* refactor callbacks

* remove var

* Update python/ray/train/callbacks/logging.py

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* Update python/ray/train/callbacks/results_prepocessors/keys.py

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* address comments; add tests

* fix test

* address comments

* docs

* address comments'

* fix test

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-01-21 17:23:34 -08:00
matthewdeng
165a025641
[train] update worker batch size docs (#21761)
Making it explicit how the user should think about batch size for PyTorch in a distributed setting, similar to what's already done for TensorFlow.

![image](https://user-images.githubusercontent.com/3967392/150421340-df73f574-8531-4626-88a6-b80442ea6b7f.png)
2022-01-21 17:22:47 -08:00
xwjiang2010
9af8f11191
Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763)
This reverts commit 38e46c9fb3.
2022-01-20 15:30:56 -08:00
Max Pumperla
38e46c9fb3
[docs] Clean up doc structure (first part) (#21667) 2022-01-20 16:19:04 +01:00
Amog Kamsetty
bcae6ba6c9
[Train] _WrappedDataLoader yield tuples (#21467)
Fixes bug with _WrappedDataLoader that yields a generator instead of a tuple.

Addresses https://discuss.ray.io/t/ray-train-creates-typeerror-generator-object-is-not-subscriptable/4605/10
2022-01-10 12:40:36 -08:00
Amog Kamsetty
123aa7cd2b
[Train] Improve usability for GPU Training (#21464)
Minor changes to improve the user experience for GPU Training.

Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608
2022-01-07 11:53:53 -08:00
Balaji Veeramani
7efe1bef11
[Train] Add PrintCallback (#21261)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2022-01-03 14:03:04 -08:00
Amog Kamsetty
57db4640ca
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train

Closes #20642
2021-12-21 17:17:52 -08:00
Junwen Yao
8325a32d66
[Train] Update saving / loading checkpoint documentation (#20973)
This PR updates saving / loading checkpoint examples.

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-12-14 09:53:17 -08:00
Amog Kamsetty
c03b937b95
[Train] Minor migration guide update (#20683)
* update docs

* tf
2021-11-29 12:42:28 -08:00
Amog Kamsetty
9796ae56d5
[Train][Data] Change usages of iter_datasets to iter_epochs (#20487) 2021-11-17 18:05:51 -08:00
Amog Kamsetty
4f88796d5a
[Train] Move to beta (#20378) 2021-11-16 08:19:30 -08:00
Amog Kamsetty
a74cf7ff1c
[Train] Torch Prepare utilities (#20254)
* update

* formatting

* fix failures

* fix session tests

* address comments

* add to api docs

* package refactor

* wip

* wip

* wip

* finish

* finish

* fix

* comment

* fix

* install horovod for docs

* address comment

* Update python/ray/train/session.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* Update python/ray/train/torch.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* address comments

* try fix docs

* fix doc build failure

* fix

* fix

* fix

* try fix doc highlighting

* fix docs

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-11-15 07:34:17 -08:00
Amog Kamsetty
65a17da2ec
[Train] Refactor Backends (#20312)
* wip

* finish

* comment

* fix

* install horovod for docs

* address comment

* fix doc build failure
2021-11-13 11:05:53 -08:00
matthewdeng
e77cc926be
[train] minor doc updates (#20271) 2021-11-12 17:20:23 -08:00
Amog Kamsetty
1803d88943
[Train] Simplify single worker training (#19814)
* wip

* update

* fix

* fix

* fix

* fix
2021-10-28 10:54:35 -07:00
matthewdeng
aa5499ef0f
[Train] implement CheckpointStrategy (#19111)
* [SGD] implement CheckpointStrategy

* address comments

* update docs

* Update doc/source/train/user_guide.rst

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>

* best checkpoint

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-10-27 11:31:04 -07:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00