hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Clark Zinzow	411bb308dc	[Datasets] [Docs] Add API docs links to I/O compatibility matrix (#21889 )	2022-01-26 12:05:27 -08:00
Dhruv Nair	3d79815cd0	Comet Integration (#20766 ) This PR adds a `CometLoggerCallback` to the Tune Integrations, allowing users to log runs from Ray to [Comet](https://www.comet.ml/site/). Co-authored-by: Michael Cullan <mjcullan@gmail.com> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-25 11:42:00 -08:00
Clark Zinzow	1971a08b7d	[RFC] [Core] Support disabling log redirection via `RAY_LOG_TO_STDERR` environment variable. (#21767 )	2022-01-25 10:52:53 -08:00
Guyang Song	089f49f554	[doc] fix doc of container-based runtime env (#21815 )	2022-01-25 12:23:15 +08:00
isaac-vidas	236fe58259	[Doc] Update requests calls to ray job submission api (#21802 )	2022-01-24 17:44:31 -08:00
Max Pumperla	7953c9ca57	[docs] integrate algolia docsearch, move to sphinx panels (#21814 )	2022-01-24 17:00:41 -08:00
shrekris-anyscale	03d93ba7ee	Add a new End-to-End tutorial in Serve that walks users through deploying a model (#20765 ) Currently, the docs have an [end-to-end tutorial](https://web.archive.org/web/20211122152843/https://docs.ray.io/en/latest/serve/tutorial.html) walking users through deploying a `Counter` function on Serve. This PR adds an end-to-end tutorial walking users through deploying an entire Hugging Face model using Serve, providing a better understanding of how to deploy an actual model via Serve. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2022-01-24 16:36:04 -06:00
matthewdeng	8119b62640	[train] refactor callback logdir and results preprocessors (#21468 ) * [train] Add TorchTensorboardProfilerCallback and introduce ResultsPreprocessors * simplify profiler * read on get_and_clear_profile_traces * refactor callbacks * remove var * Update python/ray/train/callbacks/logging.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * Update python/ray/train/callbacks/results_prepocessors/keys.py Co-authored-by: Antoni Baum <antoni.baum@protonmail.com> * address comments; add tests * fix test * address comments * docs * address comments' * fix test Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2022-01-21 17:23:34 -08:00
matthewdeng	165a025641	[train] update worker batch size docs (#21761 ) Making it explicit how the user should think about batch size for PyTorch in a distributed setting, similar to what's already done for TensorFlow. ![image](https://user-images.githubusercontent.com/3967392/150421340-df73f574-8531-4626-88a6-b80442ea6b7f.png)	2022-01-21 17:22:47 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
Adam Golinski	2954bf9a48	[docs][tune] Fix typo in schedulers.rst (#21777 ) Fix typo in schedulers.rst	2022-01-21 13:21:01 -08:00
xwjiang2010	9af8f11191	Revert "[docs] Clean up doc structure (first part) (#21667 )" (#21763 ) This reverts commit `38e46c9fb3`.	2022-01-20 15:30:56 -08:00
Max Pumperla	38e46c9fb3	[docs] Clean up doc structure (first part) (#21667 )	2022-01-20 16:19:04 +01:00
Philipp Moritz	fbc51d6d0e	[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086 ) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md).	2022-01-19 19:42:17 -08:00
Archit Kulkarni	7d74a9face	[doc] add Ray versions 1.9.1 - 1.10.0 to dask on ray compatibility table (#21360 ) I updated this version compatibility table on the release branch but didn't update it on master. This is my mistake, the process is to make a PR to master and then cherry pick that commit to the release branch.	2022-01-19 18:55:05 -08:00
Richard Liaw	169e422937	[docs] Make Jobs more prominent in documentation (#21575 )	2022-01-13 23:49:34 -08:00
Kai Fricke	a3442df584	[ci/multinode] Build multinode image with OpenSSH before running tests (#21544 ) Currently we install OpenSSH on the fly in fake multinode docker testing. Instead we can speed testing up a fair bit by building a Docker image which includes OpenSSH first and then run tests with this image.	2022-01-13 08:47:04 -08:00
Ruoyun Huang	a36b7a9908	[doc]Update doc for profiling using the correct VARs (#21561 ) Based on code here: https://github.com/ray-project/ray/blob/master/python/ray/_private/services.py#L702 Also, verified that the ENV vars as is makes "ray start" crash.	2022-01-12 23:01:51 -08:00
Max Pumperla	703c161034	[doc] Fix sklearn doc error, introduce MyST markdown parser (#21527 )	2022-01-12 15:17:28 -08:00
Eric Liang	a69ae1d886	Add blogs to dataset materials (#21546 )	2022-01-11 22:09:57 -08:00
Kai Fricke	5a7f6e4fdd	[rfc][ci] create fake docker-compose cluster environment (#20256 ) Following #18987 this PR adds a docker-compose based local multi node cluster. The fake multinode docker comprises two parts. The docker_monitor.py script is a watch script calling docker compose up whenever the docker-compose.yaml changes. The node provider creates and updates the docker compose according to the autoscaling requirements. This mode fully supports autoscaling and comes with test utilities to start and connect to docker-compose autoscaling environments. There's also a sample test case showing how this can be used.	2022-01-11 04:35:36 +00:00
hckuo	7955333ffd	[runtime env] allow working_dir to be a zipped package (#20826 ) Check if working_dir is a zip, unzip it if so.	2022-01-10 18:29:01 -06:00
Amog Kamsetty	bcae6ba6c9	[Train] `_WrappedDataLoader` yield tuples (#21467 ) Fixes bug with _WrappedDataLoader that yields a generator instead of a tuple. Addresses https://discuss.ray.io/t/ray-train-creates-typeerror-generator-object-is-not-subscriptable/4605/10	2022-01-10 12:40:36 -08:00
Amog Kamsetty	123aa7cd2b	[Train] Improve usability for GPU Training (#21464 ) Minor changes to improve the user experience for GPU Training. Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608	2022-01-07 11:53:53 -08:00
Eric Liang	e9068c45fa	[data] Instrument most remaining dataset functions and add docs (#21412 ) This PR finishes most of the stats todos for dataset. The main thing punted for future work is instrumentation of split(), which is particularly tricky since only certain blocks are transformed. Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2022-01-06 17:08:56 -08:00
Amog Kamsetty	8b4cb45088	[Docs] Update Ray Lightning API (#21428 ) Update ray lightning api docs to reflect new changes in ray lightning master. Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this. Closes #21426	2022-01-06 12:14:33 -08:00
Sven Mika	c01245763e	[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339 )	2022-01-04 18:30:26 +01:00
Jiajun Yao	5aa00ba5eb	[doc] Fix typos in serve documentation (#21379 )	2022-01-04 10:56:07 -06:00
Balaji Veeramani	7efe1bef11	[Train] Add `PrintCallback` (#21261 ) Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-01-03 14:03:04 -08:00
Kai Fricke	489e6945a6	Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )" (#21338 ) This reverts commit `327eb84154`.	2022-01-03 10:21:25 +00:00
Benjamin Black	327eb84154	[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )	2022-01-02 21:29:09 +01:00
Ishant Mrinal	ec34185771	[RLlib] RE3 documentation (#21199 )	2022-01-02 17:31:53 +01:00
Philipp Moritz	583744ab57	Graduate Ray on Windows from experimental to beta (#21268 )	2021-12-27 00:19:48 -08:00
Akash Patel	cbcd03b779	Upgrade cython to 0.29.26 for py310 (#21244 )	2021-12-26 20:26:08 -08:00
Amog Kamsetty	57db4640ca	[Train] [Tune] Refactor MLflow (#20802 ) Pulls out Tune's MLflow logging logic to a shared MLflow util. Adds an MLflow logger callback to Ray Train Closes #20642	2021-12-21 17:17:52 -08:00
Linsong Chu	61bbecdb7d	[Workflow]add doc for metadata (#20156 ) This PR adds documentation for Workflow Metadata, which we recently added support in https://github.com/ray-project/ray/pull/19372. Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2021-12-20 17:24:07 -08:00
architkulkarni	5cc1308c66	[runtime env] [doc] [test] Add docs and tests for `RAY_runtime_env_skip_local_gc` environment variable (#21163 )	2021-12-20 10:34:59 -08:00
Guyang Song	2a9d9726d6	[doc] add doc for container runtime env (#21131 )	2021-12-20 14:13:05 +08:00
Clark Zinzow	c3d68fa0c1	[Dask-on-Ray] Add Dask config helper, set task-based shuffle by default. (#21114 ) Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well. This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`.	2021-12-17 13:16:37 -08:00
Hankpipi	97d3142c59	[Serve] Fix naming error and add Serve metric for HTTP error codes (#21009 )	2021-12-16 09:48:03 -08:00
Scott Graham	7153d58cbd	Updates to azure autoscaler for authentication and dependency updates (#19603 ) * updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication * adding azure sdk rqmts for tests * updating azure test requirements and adding wrapper function for azure sdk function resolution * adding docstring to get_azure_sdk_function Co-authored-by: Scott Graham <scgraham@microsoft.com>	2021-12-16 09:23:32 -08:00
Sven Mika	e485aa846a	[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786 )	2021-12-15 22:32:52 +01:00
Jiao	e9daacff60	[Job][Docs] Update docs architecture image link (#21087 )	2021-12-14 23:07:38 -08:00
Junwen Yao	8325a32d66	[Train] Update saving / loading checkpoint documentation (#20973 ) This PR updates saving / loading checkpoint examples. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-12-14 09:53:17 -08:00
Jules S. Damji	064f976eb4	Added hyperparameters to the concepts section (#21024 ) Added hyperameters to the concetp section since it's important to explain what they are and added diagrams help readeer visualize the difference between model and hyperparameters Signed-off-by: Jules S.Damji <jules@anyscale.com> Co-authored-by: Jules S.Damji <jules@anyscale.com>	2021-12-13 12:21:39 +00:00
Sven Mika	f814c2af89	[RLlib; Docs] Docs API reference pages: `rllib/execution`, `rllib/evaluation`, `rllib/models`, `rllib/offline`. (#20538 )	2021-12-10 09:41:29 +01:00
Jules S. Damji	065786b7fe	[docs] Make design pattern example self contained (#20981 ) Signed-off-by: Jules S.Damji jules@anyscale.com Why are these changes needed? The code snippet referenced a python function that was not defined, therefore the code snippet as is won't work. All complete or self-contained code in our docs should run. The changes made were adding the undefined function, iterating over a list of different random large arrays to show the difference between local or distributed sort's execution time, and print them. Closes #20960	2021-12-09 20:19:38 -08:00
Eric Liang	22ccc6b300	Initial stats framework for datasets (#20867 ) This adds an initial Dataset.stats() framework for debugging dataset performance. At a high level, execution stats for tasks (e.g., CPU time) are attached to block metadata objects. Datasets have stats objects that hold references to these stats and parent dataset stats (this avoids stats holding references to parent datasets, allowing them to be gc'ed). Similarly, DatasetPipelines hold stats from recently computed datasets. Currently only basic ops like map / map_batches are instrumented. TODO placeholders are left for future PRs.	2021-12-08 16:13:57 -08:00
Flamur Gogolli	3ca10ccc47	Textual correction on TLS Authentication (#20935 ) Correct wording on the TLS Authentication section of the configure.rst page.	2021-12-07 19:05:16 -08:00
Yi Cheng	ea1d081aac	[core] Simple chaos testing for asio (#19970 ) Right now in ray, a lot of edge cases related to grpc are not tested. This PR is just a simple try to give the developer some way to delay grpc request. It could be used with manual testing and also e2e test since it's supporting delay for specific grpc method. To use this feature, just simple set os env `RAY_TESTING_ASIO_DELAY_US="method1=10:20,method2=20:30,*=200:200"` This means, for `method1` it'll delay 10-20us, for method2 it'll delay 20-30us. For all the rest, it'll delay 200us.	2021-12-07 14:47:07 -08:00

1 2 3 4 5 ...

1634 commits