matthewdeng
|
8119b62640
|
[train] refactor callback logdir and results preprocessors (#21468)
* [train] Add TorchTensorboardProfilerCallback and introduce ResultsPreprocessors
* simplify profiler
* read on get_and_clear_profile_traces
* refactor callbacks
* remove var
* Update python/ray/train/callbacks/logging.py
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* Update python/ray/train/callbacks/results_prepocessors/keys.py
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* address comments; add tests
* fix test
* address comments
* docs
* address comments'
* fix test
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2022-01-21 17:23:34 -08:00 |
|
matthewdeng
|
165a025641
|
[train] update worker batch size docs (#21761)
Making it explicit how the user should think about batch size for PyTorch in a distributed setting, similar to what's already done for TensorFlow.

|
2022-01-21 17:22:47 -08:00 |
|
xwjiang2010
|
9af8f11191
|
Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763)
This reverts commit 38e46c9fb3 .
|
2022-01-20 15:30:56 -08:00 |
|
Max Pumperla
|
38e46c9fb3
|
[docs] Clean up doc structure (first part) (#21667)
|
2022-01-20 16:19:04 +01:00 |
|
Amog Kamsetty
|
bcae6ba6c9
|
[Train] _WrappedDataLoader yield tuples (#21467)
Fixes bug with _WrappedDataLoader that yields a generator instead of a tuple.
Addresses https://discuss.ray.io/t/ray-train-creates-typeerror-generator-object-is-not-subscriptable/4605/10
|
2022-01-10 12:40:36 -08:00 |
|
Amog Kamsetty
|
123aa7cd2b
|
[Train] Improve usability for GPU Training (#21464)
Minor changes to improve the user experience for GPU Training.
Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608
|
2022-01-07 11:53:53 -08:00 |
|
Balaji Veeramani
|
7efe1bef11
|
[Train] Add PrintCallback (#21261)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
|
2022-01-03 14:03:04 -08:00 |
|
Amog Kamsetty
|
57db4640ca
|
[Train] [Tune] Refactor MLflow (#20802)
Pulls out Tune's MLflow logging logic to a shared MLflow util.
Adds an MLflow logger callback to Ray Train
Closes #20642
|
2021-12-21 17:17:52 -08:00 |
|
Junwen Yao
|
8325a32d66
|
[Train] Update saving / loading checkpoint documentation (#20973)
This PR updates saving / loading checkpoint examples.
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
|
2021-12-14 09:53:17 -08:00 |
|
Amog Kamsetty
|
c03b937b95
|
[Train] Minor migration guide update (#20683)
* update docs
* tf
|
2021-11-29 12:42:28 -08:00 |
|
Amog Kamsetty
|
9796ae56d5
|
[Train][Data] Change usages of iter_datasets to iter_epochs (#20487)
|
2021-11-17 18:05:51 -08:00 |
|
Amog Kamsetty
|
4f88796d5a
|
[Train] Move to beta (#20378)
|
2021-11-16 08:19:30 -08:00 |
|
Amog Kamsetty
|
a74cf7ff1c
|
[Train] Torch Prepare utilities (#20254)
* update
* formatting
* fix failures
* fix session tests
* address comments
* add to api docs
* package refactor
* wip
* wip
* wip
* finish
* finish
* fix
* comment
* fix
* install horovod for docs
* address comment
* Update python/ray/train/session.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* Update python/ray/train/torch.py
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
* address comments
* try fix docs
* fix doc build failure
* fix
* fix
* fix
* try fix doc highlighting
* fix docs
Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
|
2021-11-15 07:34:17 -08:00 |
|
Amog Kamsetty
|
65a17da2ec
|
[Train] Refactor Backends (#20312)
* wip
* finish
* comment
* fix
* install horovod for docs
* address comment
* fix doc build failure
|
2021-11-13 11:05:53 -08:00 |
|
matthewdeng
|
e77cc926be
|
[train] minor doc updates (#20271)
|
2021-11-12 17:20:23 -08:00 |
|
Amog Kamsetty
|
1803d88943
|
[Train] Simplify single worker training (#19814)
* wip
* update
* fix
* fix
* fix
* fix
|
2021-10-28 10:54:35 -07:00 |
|
matthewdeng
|
aa5499ef0f
|
[Train] implement CheckpointStrategy (#19111)
* [SGD] implement CheckpointStrategy
* address comments
* update docs
* Update doc/source/train/user_guide.rst
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
* best checkpoint
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
|
2021-10-27 11:31:04 -07:00 |
|
matthewdeng
|
4674c78050
|
[Train] Rename Ray SGD v2 to Ray Train (#19436)
|
2021-10-18 22:27:46 -07:00 |
|