hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Alex Wu	c9a419ac76	[Autoscaler] Remove staroid node provider (#22236 ) The Staroid node provider has been abandoned and unmaintained for quite some time now. Due to the fact that there are no active maintainers, the original contributors cannot be reached, and there is no clear interest, we are no longer officially endorsing or supporting the node provider. Co-authored-by: Alex Wu <alex@anyscale.com>	2022-02-09 09:18:18 -08:00
xwjiang2010	323511b716	[tune] Single wait refactor. (#21852 ) This is a down scoped change. For the full overview picture of Tune control loop, see [`Tune control loop refactoring`](https://docs.google.com/document/d/1RDsW7SVzwMPZfA0WLOPA4YTqbRyXIHGYmBenJk33HaE/edit#heading=h.2za3bbxbs5gn) 1. Previously there are separate waits on pg ready and other events. As a result, there are quite a few timing tweaks that are inefficient, hard to understand and unit test. This PR consolidates into a single wait that is handled by TrialRunner in each step. - A few event types are introduced, and their mapping into scenarios * PG_READY --> Should place a trial onto it. If somehow there is no trial to be placed there, the pg will be put in _ready momentarily. This is due to historically resources is conceptualized as a pull based model. * NO_RUNNING_TRIALS_TIME_OUT --> possibly not sufficient resources case * TRAINING_RESULT * SAVING_RESULT * RESTORING_RESULT * YIELD --> This just means that simply taking very long to train. We need to punt back to the main loop to print out status info etc. 2. Previously TrialCleanup is not very efficient and can be racing between Trainable.stop() and `return_placement_group`. This PR streamlines the Trial cleanup process by explicitly let Trainable.stop() to finish followed by `return_placement_group(pg)`. Note, graceful shutdown is needed in cases like `pause_trial` where checkpointing to memory needs to be given the time to happen before the actor is gone. 3. There are quite some env variables removed (timing tweaks), that I consider OK to proceed without deprecation cycle.	2022-02-09 15:31:17 +00:00
Balaji Veeramani	31ed9e5d02	[CI] Replace YAPF disables with Black disables (#21982 )	2022-02-08 16:29:25 -08:00
Jules S. Damji	6b7d995e64	Added a hands-on self-containted MLflow/Ray Serve deployment example (#22192 )	2022-02-08 12:10:53 -08:00
Guyang Song	36ba514f9c	[Doc] Fix bad doc and recover doc of c++ api (#22213 )	2022-02-08 19:04:37 +08:00
Guyang Song	9f77090c1c	[Doc] Fix bad links of dask and mars in ray-libraries.rst (#22210 )	2022-02-08 19:02:49 +08:00
Max Pumperla	5cc9355303	[Docs ] Tune docs overhaul (first part) (#22112 ) Continuing docs overhaul, tune now has: - [x] better landing page - [x] a getting started guide - [x] user guide was cut down, partially merged with FAQ, and partially integrated with tutorials - [x] the new user guide contains guides to tune features and practical integrations - [x] we rewrote some of the feature guides for clarity - [x] we got rid of sphinx-gallery for this sub-project (only data and core left), as it looks bad and is unnecessarily complicated anyway (plus, makes the build slower) - [x] sphinx-gallery examples are now moved to markdown notebook, as started in #22030. - [x] Examples are tested in the new framework, of course. There's still a lot one can do, but this is already getting too large. Will follow up with more fine-tuning next week. Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>	2022-02-07 15:47:03 +00:00
Clark Zinzow	fb0d6e6b0b	[Datasets] [Docs] Datasets library branding + positioning tweaks (#22067 )	2022-02-05 16:59:34 -08:00
Jules S. Damji	c5c5e01b5d	[Doc] [Serve] Fixed minor typo and removed extract ',' (#22101 )	2022-02-04 14:51:38 -08:00
Archit Kulkarni	d7be4e1d3c	[doc] [runtime env] Add note that referencing local files in requirements.txt is not supported (#22095 )	2022-02-04 15:32:19 -06:00
matthewdeng	014a9959f1	Revert "[train] add TorchTensorboardProfilerCallback (#21864 )" (#22117 ) This reverts commit `f064306de9`.	2022-02-04 08:54:16 -08:00
Clark Zinzow	743ce65da8	[Dask-on-Ray] Add support for Dask annotations. (#22057 )	2022-02-03 22:15:38 -08:00
matthewdeng	f064306de9	[train] add TorchTensorboardProfilerCallback (#21864 ) Implement a TorchTensorboardProfilerCallback and corresponding TorchWorkerProfiler to support distributed PyTorch Profiler With TensorBoard integration.	2022-02-03 19:28:12 -08:00
Max Pumperla	092598774a	[Docs] Executable notebook tutorial (#22030 ) We're introducing the usage of [MyST Notebooks](https://myst-nb.readthedocs.io/en/latest/index.html) here and demonstrate how it works by rewriting (and extending) the RLLib Serve tutorial. Benefits: - [x] Write notebooks in markdown. Can be converted into other formats e.g. with `jupytext` - [x] Tutorials like this have a binderhub link added to the top nav (launch button). - [x] Notebooks get executed when docs are built, so it's impossible to have stale docs. - [x] But locally those builds are cached so that you don't have to wait too long. - [x] The notebook cell outputs can be shown, hidden or removed. In particular, we can now avoid adding expected code output as comments in our scripts (which might get outdated). We're also clarifying #22022. Old tutorial: [here](https://docs.ray.io/en/latest/serve/tutorials/rllib.html) New tutorial (preview): [here](https://ray--22030.org.readthedocs.build/en/22030/serve/tutorials/rllib.html) Co-authored-by: simon-mo <simon.mo@hey.com>	2022-02-03 08:13:04 +00:00
Archit Kulkarni	78f882dbbc	[runtime env] Local uri caching for working_dir, py_modules and conda (#20273 ) Previously, local files corresponding to runtime env URIs were eagerly garbage collected as soon as there were no more references to them. In this PR, we store this data in a cache instead, so when the reference count for a URI drops to zero, instead of deleting it we simple mark it as unused in the cache. When the cache exceeds its size limit (default 10 GB) it will delete unused URIs until the cache is back under the size limit or there are no more unused URIs. Design doc: https://docs.google.com/document/d/1x1JAHg7c0ewcOYwhhclbuW0B0UC7l92WFkF4Su0T-dk/edit - Adds unit tests for caching and integration tests for working_dir caching	2022-02-02 14:53:03 -06:00
Eric Liang	3d449d4f71	[docs] Clean up long titles in TOC (#22016 )	2022-02-01 22:56:49 -08:00
Balaji Veeramani	6441335f5e	[Doc] Correct information about code style (#21985 )	2022-02-01 10:37:21 -08:00
SangBin Cho	2db71f72cc	[Doc] Remove the legacy doc (#21996 )	2022-01-31 15:26:19 -08:00
Kai Yang	2038cc96c6	Revert "Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 )" (#21894 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR.	2022-01-31 12:09:51 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Junwen Yao	eb8adc6105	[train] add a utility function to turn off TF autosharding (#21887 ) This PR adds a utility function to turn off TF autosharding as a temporary solution. Closes #19324.	2022-01-28 16:09:06 -08:00
Clark Zinzow	09fab70991	[Datasets] [Docs] Fix bug in Datasets locality-aware splitting example (#21937 ) Fixes bug in Datasets locality-aware splitting example.	2022-01-27 14:46:04 -08:00
mwtian	559eefd06f	[Doc] update dask version for Ray 1.11.0 (#21933 ) This is needed for release 1.11.0.	2022-01-27 13:15:01 -08:00
Max Pumperla	4dd221f848	[Docs] Ray Data docs target state (#21931 ) Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html) The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have - [x] A Getting Started Guide - [x] An explicit User / How-To Guide - [x] A dedicated Key Concepts page - [x] A consistent naming convention in `Ray Data` whenever is is referred to the project. This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.	2022-01-27 13:14:36 -08:00
Sven Mika	893536ebd9	[RLlib] Move bandits into main agents folder; Make RecSim adapter more accessible; (#21773 )	2022-01-27 13:58:12 +01:00
Sven Mika	371fbb17e4	[RLlib] Make `policies_to_train` more flexible via callable option. (#20735 )	2022-01-27 12:17:34 +01:00
Max Pumperla	b34099e764	[docs] landing page (fixes #21750 ) (#21859 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-01-26 17:14:25 -08:00
Clark Zinzow	411bb308dc	[Datasets] [Docs] Add API docs links to I/O compatibility matrix (#21889 )	2022-01-26 12:05:27 -08:00
Dhruv Nair	3d79815cd0	Comet Integration (#20766 ) This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/). Co-authored-by: Michael Cullan <mjcullan@gmail.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-25 11:42:00 -08:00
Clark Zinzow	1971a08b7d	[RFC] [Core] Support disabling log redirection via `RAY_LOG_TO_STDERR` environment variable. (#21767 )	2022-01-25 10:52:53 -08:00
Guyang Song	089f49f554	[doc] fix doc of container-based runtime env (#21815 )	2022-01-25 12:23:15 +08:00
isaac-vidas	236fe58259	[Doc] Update requests calls to ray job submission api (#21802 )	2022-01-24 17:44:31 -08:00
Max Pumperla	7953c9ca57	[docs] integrate algolia docsearch, move to sphinx panels (#21814 )	2022-01-24 17:00:41 -08:00
shrekris-anyscale	03d93ba7ee	Add a new End-to-End tutorial in Serve that walks users through deploying a model (#20765 ) Currently, the docs have an [end-to-end tutorial](https://web.archive.org/web/20211122152843/https://docs.ray.io/en/latest/serve/tutorial.html) walking users through deploying a `Counter` function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-01-24 16:36:04 -06:00
matthewdeng	8119b62640	[train] refactor callback logdir and results preprocessors (#21468 ) * [train] Add TorchTensorboardProfilerCallback and introduce ResultsPreprocessors * simplify profiler * read on get_and_clear_profile_traces * refactor callbacks * remove var * Update python/ray/train/callbacks/logging.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * Update python/ray/train/callbacks/results_prepocessors/keys.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * address comments; add tests * fix test * address comments * docs * address comments' * fix test Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-21 17:23:34 -08:00
matthewdeng	165a025641	[train] update worker batch size docs (#21761 ) Making it explicit how the user should think about batch size for PyTorch in a distributed setting, similar to what's already done for TensorFlow. ![image](https://user-images.githubusercontent.com/3967392/150421340-df73f574-8531-4626-88a6-b80442ea6b7f.png)	2022-01-21 17:22:47 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
Adam Golinski	2954bf9a48	[docs][tune] Fix typo in schedulers.rst (#21777 ) Fix typo in schedulers.rst	2022-01-21 13:21:01 -08:00
xwjiang2010	c22a9fa731	[Docs] pin ray lightning version to fix lint error. (#21764 )	2022-01-20 17:52:24 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Philipp Moritz	fbc51d6d0e	[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086 ) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).	2022-01-19 19:42:17 -08:00
Archit Kulkarni	7d74a9face	[doc] add Ray versions 1.9.1 - 1.10.0 to dask on ray compatibility table (#21360 ) I updated this version compatibility table on the release branch but didn't update it on master. This is my mistake, the process is to make a PR to master and then cherry pick that commit to the release branch.	2022-01-19 18:55:05 -08:00
Jiajun Yao	fa5c167717	Revert "[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) (#21661 ) This reverts commit `4a55d10bb1`.	2022-01-18 06:11:20 -08:00
Kai Yang	4a55d10bb1	[Dataset] [DataFrame 2/n] Add pandas block format implementation (partial) (#20988 ) This PR adds pandas block format support by implementing `PandasRow`, `PandasBlockBuilder`, `PandasBlockAccessor`. Note that `sort_and_partition`, `combine`, `merge_sorted_blocks`, `aggregate_combined_blocks` in `PandasBlockAccessor` redirects to arrow block format implementation for now. They'll be implemented in a later PR. Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com> Co-authored-by: Eric Liang <ekhliang@gmail.com>	2022-01-15 17:28:34 +08:00
Archit Kulkarni	26057c433f	[CI] pin uvicorn to 0.16.0 to fix serve (#21612 )	2022-01-14 16:00:51 -08:00
Richard Liaw	169e422937	[docs] Make Jobs more prominent in documentation (#21575 )	2022-01-13 23:49:34 -08:00
Kai Fricke	a3442df584	[ci/multinode] Build multinode image with OpenSSH before running tests (#21544 ) Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.	2022-01-13 08:47:04 -08:00
Ruoyun Huang	a36b7a9908	[doc]Update doc for profiling using the correct VARs (#21561 ) Based on code here: https://github.com/ray-project/ray/blob/master/python/ray/_private/services.py#L702 Also, verified that the ENV vars as is makes "ray start" crash.	2022-01-12 23:01:51 -08:00
Max Pumperla	703c161034	[doc] Fix sklearn doc error, introduce MyST markdown parser (#21527 )	2022-01-12 15:17:28 -08:00

1 2 3 4 5 ...

1848 commits