hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Antoni Baum	3625c4760f	[ML/Train] Add `TensorflowTrainer` interface (#23072 ) Interface for TensorflowTrainer Depends on #22988 Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2022-03-15 14:02:17 -07:00
Balaji Veeramani	c694ed4594	[Train] Add `enable_reproducibility` (#22851 ) This PR adds a feature that allows user to make their training runs more reproducible. I've implemented this feature by following PyTorch's guide on how to limit sources of randomness (https://pytorch.org/docs/stable/notes/randomness.html). These changes will make it easier for us to benchmark Ray Train, and also make it easier for users to reproduce their experiments.	2022-03-15 11:07:34 -07:00
Amog Kamsetty	e1f24a244b	[ml/train] Training Interfaces [3/4]: `DataParallelTrainer` interface (#22988 ) Interface for DataParallelTrainer and updates to ScalingConfig definition. Depends on #22986 Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu> Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-03-15 08:11:05 -07:00
Max Pumperla	2b8faae40c	[docs] re/move old core examples (#22802 )	2022-03-10 12:17:00 -08:00
Max Pumperla	11c40e363d	[docs] external promo content (#22823 )	2022-03-10 11:39:44 -08:00
Max Pumperla	d53d0e0f50	[docs] Typo - fixes #22761 (#22763 ) Signed-off-by: Max Pumperla <max.pumperla@googlemail.com>	2022-03-02 10:34:46 +01:00
Amog Kamsetty	80e0d9cea4	[Train] Update docs for ray.train.torch import (#22555 ) Update more examples to include the ray.train.torch import line. Follow up to #21969	2022-02-23 19:22:27 -08:00
Hao Chen	78597d3089	[train] Minor fixes on Ray Train user guide doc (#22379 ) Fixes some typos and format issues.	2022-02-15 10:09:27 -08:00
matthewdeng	8f9e0d7f6b	[train] add TorchTensorboardProfilerCallback (#22345 ) The [original PR](https://github.com/ray-project/ray/pull/21864) was [reverted](https://github.com/ray-project/ray/pull/22117) because it caused `torch` (more specifically, `torch>=1.8.1`) to be required to use `ray.train`. ``` \| File "ray_sgd_training.py", line 18, in <module> \| from ray import train \| File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/train/__init__.py", line 2, in <module> \| from ray.train.callbacks import TrainingCallback \| File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/train/callbacks/__init__.py", line 8, in <module> \| from ray.train.callbacks.profile import TorchTensorboardProfilerCallback \| File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/train/callbacks/profile.py", line 6, in <module> \| from torch.profiler import profile \| ModuleNotFoundError: No module named 'torch.profiler' ``` A [minimal installation test suite](https://github.com/ray-project/ray/pull/22300) was added to detect this. Further, in this PR we make the following changes: 1. Move `TorchWorkerProfiler` to `ray.train.torch` so all torch imports are centralized. 2. Add import validation logic to `TorchWorkerProfiler.__init__` so an exception will only be raised if the user tries to initialize a `TorchWorkerProfiler` without having a valid version of `torch` installed: ``` >>> import ray >>> import ray.train >>> import ray.train.torch >>> from ray.train.torch import TorchWorkerProfiler >>> twp = TorchWorkerProfiler() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/matt/workspace/ray/python/ray/train/torch.py", line 365, in __init__ "Torch Profiler requires torch>=1.8.1. " ImportError: Torch Profiler requires torch>=1.8.1. Run `pip install 'torch>=1.8.1'` to use TorchWorkerProfiler. ```	2022-02-14 16:16:55 -08:00
Max Pumperla	5cc9355303	[Docs ] Tune docs overhaul (first part) (#22112 ) Continuing docs overhaul, tune now has: - [x] better landing page - [x] a getting started guide - [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials - [x] the new user guide contains guides to tune features and practical integrations - [x] we rewrote some of the feature guides for clarity - [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower) - [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030. - [x] Examples are tested in the new framework, of course. There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-02-07 15:47:03 +00:00
matthewdeng	014a9959f1	Revert "[train] add TorchTensorboardProfilerCallback (#21864 )" (#22117 ) This reverts commit `f064306de9`.	2022-02-04 08:54:16 -08:00
matthewdeng	f064306de9	[train] add TorchTensorboardProfilerCallback (#21864 ) Implement a TorchTensorboardProfilerCallback and corresponding TorchWorkerProfiler to support distributed PyTorch Profiler With TensorBoard integration.	2022-02-03 19:28:12 -08:00
Junwen Yao	eb8adc6105	[train] add a utility function to turn off TF autosharding (#21887 ) This PR adds a utility function to turn off TF autosharding as a temporary solution. Closes #19324.	2022-01-28 16:09:06 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Max Pumperla	7953c9ca57	[docs] integrate algolia docsearch, move to sphinx panels (#21814 )	2022-01-24 17:00:41 -08:00
matthewdeng	8119b62640	[train] refactor callback logdir and results preprocessors (#21468 ) * [train] Add TorchTensorboardProfilerCallback and introduce ResultsPreprocessors * simplify profiler * read on get_and_clear_profile_traces * refactor callbacks * remove var * Update python/ray/train/callbacks/logging.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * Update python/ray/train/callbacks/results_prepocessors/keys.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * address comments; add tests * fix test * address comments * docs * address comments' * fix test Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-21 17:23:34 -08:00
matthewdeng	165a025641	[train] update worker batch size docs (#21761 ) Making it explicit how the user should think about batch size for PyTorch in a distributed setting, similar to what's already done for TensorFlow. ![image](https://user-images.githubusercontent.com/3967392/150421340-df73f574-8531-4626-88a6-b80442ea6b7f.png)	2022-01-21 17:22:47 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Amog Kamsetty	bcae6ba6c9	[Train] `_WrappedDataLoader` yield tuples (#21467 ) Fixes bug with _WrappedDataLoader that yields a generator instead of a tuple. Addresses https://discuss.ray.io/t/ray-train-creates-typeerror-generator-object-is-not-subscriptable/4605/10	2022-01-10 12:40:36 -08:00
Amog Kamsetty	123aa7cd2b	[Train] Improve usability for GPU Training (#21464 ) Minor changes to improve the user experience for GPU Training. Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608	2022-01-07 11:53:53 -08:00
Balaji Veeramani	7efe1bef11	[Train] Add `PrintCallback` (#21261 ) Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-01-03 14:03:04 -08:00
Amog Kamsetty	57db4640ca	[Train] [Tune] Refactor MLflow (#20802 ) Pulls out Tune's MLflow logging logic to a shared MLflow util. Adds an MLflow logger callback to Ray Train Closes #20642	2021-12-21 17:17:52 -08:00
Junwen Yao	8325a32d66	[Train] Update saving / loading checkpoint documentation (#20973 ) This PR updates saving / loading checkpoint examples. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-12-14 09:53:17 -08:00
Amog Kamsetty	c03b937b95	[Train] Minor migration guide update (#20683 ) * update docs * tf	2021-11-29 12:42:28 -08:00
Amog Kamsetty	9796ae56d5	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
Amog Kamsetty	4f88796d5a	[Train] Move to beta (#20378 )	2021-11-16 08:19:30 -08:00
Amog Kamsetty	a74cf7ff1c	[Train] Torch Prepare utilities (#20254 ) * update * formatting * fix failures * fix session tests * address comments * add to api docs * package refactor * wip * wip * wip * finish * finish * fix * comment * fix * install horovod for docs * address comment * Update python/ray/train/session.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * Update python/ray/train/torch.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * address comments * try fix docs * fix doc build failure * fix * fix * fix * try fix doc highlighting * fix docs Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-11-15 07:34:17 -08:00
Amog Kamsetty	65a17da2ec	[Train] Refactor Backends (#20312 ) * wip * finish * comment * fix * install horovod for docs * address comment * fix doc build failure	2021-11-13 11:05:53 -08:00
matthewdeng	e77cc926be	[train] minor doc updates (#20271 )	2021-11-12 17:20:23 -08:00
Amog Kamsetty	1803d88943	[Train] Simplify single worker training (#19814 ) * wip * update * fix * fix * fix * fix	2021-10-28 10:54:35 -07:00
matthewdeng	aa5499ef0f	[Train] implement CheckpointStrategy (#19111 ) * [SGD] implement CheckpointStrategy * address comments * update docs * Update doc/source/train/user_guide.rst Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com> * best checkpoint Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>	2021-10-27 11:31:04 -07:00
matthewdeng	4674c78050	[Train] Rename Ray SGD v2 to Ray Train (#19436 )	2021-10-18 22:27:46 -07:00

33 commits