hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
Amog Kamsetty	123aa7cd2b	[Train] Improve usability for GPU Training (#21464 ) Minor changes to improve the user experience for GPU Training. Addresses https://discuss.ray.io/t/ray-train-doesnt-detect-gpu/4608	2022-01-07 11:53:53 -08:00
Eric Liang	e9068c45fa	[data] Instrument most remaining dataset functions and add docs (#21412 ) This PR finishes most of the stats todos for dataset. The main thing punted for future work is instrumentation of split(), which is particularly tricky since only certain blocks are transformed. Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>	2022-01-06 17:08:56 -08:00
Amog Kamsetty	8b4cb45088	[Docs] Update Ray Lightning API (#21428 ) Update ray lightning api docs to reflect new changes in ray lightning master. Making this quick change to fix CI and unblock the release, but will follow up on a proper fix for this. Closes #21426	2022-01-06 12:14:33 -08:00
Sven Mika	c01245763e	[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339 )	2022-01-04 18:30:26 +01:00
Jiajun Yao	5aa00ba5eb	[doc] Fix typos in serve documentation (#21379 )	2022-01-04 10:56:07 -06:00
Balaji Veeramani	7efe1bef11	[Train] Add `PrintCallback` (#21261 ) Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2022-01-03 14:03:04 -08:00
Kai Fricke	489e6945a6	Revert "[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )" (#21338 ) This reverts commit `327eb84154`.	2022-01-03 10:21:25 +00:00
Benjamin Black	327eb84154	[RLlib] Updated pettingzoo wrappers, env versions, urls (#20113 )	2022-01-02 21:29:09 +01:00
Ishant Mrinal	ec34185771	[RLlib] RE3 documentation (#21199 )	2022-01-02 17:31:53 +01:00
Philipp Moritz	583744ab57	Graduate Ray on Windows from experimental to beta (#21268 )	2021-12-27 00:19:48 -08:00
Akash Patel	cbcd03b779	Upgrade cython to 0.29.26 for py310 (#21244 )	2021-12-26 20:26:08 -08:00
Amog Kamsetty	57db4640ca	[Train] [Tune] Refactor MLflow (#20802 ) Pulls out Tune's MLflow logging logic to a shared MLflow util. Adds an MLflow logger callback to Ray Train Closes #20642	2021-12-21 17:17:52 -08:00
Linsong Chu	61bbecdb7d	[Workflow]add doc for metadata (#20156 ) This PR adds documentation for Workflow Metadata, which we recently added support in https://github.com/ray-project/ray/pull/19372. Co-authored-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>	2021-12-20 17:24:07 -08:00
architkulkarni	5cc1308c66	[runtime env] [doc] [test] Add docs and tests for `RAY_runtime_env_skip_local_gc` environment variable (#21163 )	2021-12-20 10:34:59 -08:00
Guyang Song	2a9d9726d6	[doc] add doc for container runtime env (#21131 )	2021-12-20 14:13:05 +08:00
Clark Zinzow	c3d68fa0c1	[Dask-on-Ray] Add Dask config helper, set task-based shuffle by default. (#21114 ) Dask default's to a disk-based shuffle even thought we're using a distributed scheduler, which appears to be resulting in dropped data since the filesystem isn't shared across nodes. Dask Distributed manually sets the shuffle algorithm in the global config to the task-based shuffle, which the Dask-on-Ray scheduler should probably do as well. This PR adds a Dask config helper, `enable_dask_on_ray`, that sets Dask-on-Ray as the default scheduler along with changing the default shuffle to a task-based shuffle. The shuffle method can still be overridden by the user by manually specifying `df.set_index(shuffle="disk")`.	2021-12-17 13:16:37 -08:00
Hankpipi	97d3142c59	[Serve] Fix naming error and add Serve metric for HTTP error codes (#21009 )	2021-12-16 09:48:03 -08:00
Scott Graham	7153d58cbd	Updates to azure autoscaler for authentication and dependency updates (#19603 ) * updating azure autoscaler versions and backwards compatibility, and moving to azure-identity based authentication * adding azure sdk rqmts for tests * updating azure test requirements and adding wrapper function for azure sdk function resolution * adding docstring to get_azure_sdk_function Co-authored-by: Scott Graham <scgraham@microsoft.com>	2021-12-16 09:23:32 -08:00
Sven Mika	e485aa846a	[RLlib; Docs overhaul] Overhaul of auto-API reference pages (via sphinx autoclass/automodule). (#19786 )	2021-12-15 22:32:52 +01:00
Jiao	e9daacff60	[Job][Docs] Update docs architecture image link (#21087 )	2021-12-14 23:07:38 -08:00
Junwen Yao	8325a32d66	[Train] Update saving / loading checkpoint documentation (#20973 ) This PR updates saving / loading checkpoint examples. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-12-14 09:53:17 -08:00
Jules S. Damji	064f976eb4	Added hyperparameters to the concepts section (#21024 ) Added hyperameters to the concetp section since it's important to explain what they are and added diagrams help readeer visualize the difference between model and hyperparameters Signed-off-by: Jules S.Damji <jules@anyscale.com> Co-authored-by: Jules S.Damji <jules@anyscale.com>	2021-12-13 12:21:39 +00:00
Sven Mika	f814c2af89	[RLlib; Docs] Docs API reference pages: `rllib/execution`, `rllib/evaluation`, `rllib/models`, `rllib/offline`. (#20538 )	2021-12-10 09:41:29 +01:00
Jules S. Damji	065786b7fe	[docs] Make design pattern example self contained (#20981 ) Signed-off-by: Jules S.Damji jules@anyscale.com Why are these changes needed? The code snippet referenced a python function that was not defined, therefore the code snippet as is won't work. All complete or self-contained code in our docs should run. The changes made were adding the undefined function, iterating over a list of different random large arrays to show the difference between local or distributed sort's execution time, and print them. Closes #20960	2021-12-09 20:19:38 -08:00
Eric Liang	22ccc6b300	Initial stats framework for datasets (#20867 ) This adds an initial Dataset.stats() framework for debugging dataset performance. At a high level, execution stats for tasks (e.g., CPU time) are attached to block metadata objects. Datasets have stats objects that hold references to these stats and parent dataset stats (this avoids stats holding references to parent datasets, allowing them to be gc'ed). Similarly, DatasetPipelines hold stats from recently computed datasets. Currently only basic ops like map / map_batches are instrumented. TODO placeholders are left for future PRs.	2021-12-08 16:13:57 -08:00
Flamur Gogolli	3ca10ccc47	Textual correction on TLS Authentication (#20935 ) Correct wording on the TLS Authentication section of the configure.rst page.	2021-12-07 19:05:16 -08:00
Yi Cheng	ea1d081aac	[core] Simple chaos testing for asio (#19970 ) Right now in ray, a lot of edge cases related to grpc are not tested. This PR is just a simple try to give the developer some way to delay grpc request. It could be used with manual testing and also e2e test since it's supporting delay for specific grpc method. To use this feature, just simple set os env `RAY_TESTING_ASIO_DELAY_US="method1=10:20,method2=20:30,*=200:200"` This means, for `method1` it'll delay 10-20us, for method2 it'll delay 20-30us. For all the rest, it'll delay 200us.	2021-12-07 14:47:07 -08:00
architkulkarni	15391026c1	add bazel build OOM tip to docs (#20833 )	2021-12-06 21:34:27 -08:00
Jiao	e065f2e30a	[Jobs] Update CLI examples to use the same setup (#20844 )	2021-12-06 12:15:17 -06:00
Sven Mika	60b2219d72	[RLlib] Allow for evaluation to run by `timesteps` (alternative to `episodes`) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. (#20757 )	2021-12-04 13:26:33 +01:00
xwjiang2010	368da1742b	[tune] Enforce one future at a time for any given trial at any given time. (#20783 ) Also enforce disabling (instead of allowing user to override this) buffer training when checkpoint_at_end is used.	2021-12-03 08:14:12 -08:00
Matti Picus	442943572b	DOC, BUILD: limit bazel resource usage with BAZEL_LIMIT_CPUS and document it (#20845 ) So I have a AMD machine with many cores and 32GB of memory. When I do `pip install -e .`, my machine crashes since bazel tries to use all the cores, but quickly runs out of memory. It seems there is no native way to set environment variables to tell bazel to limit its resource consumption, but there is a `--local_cpu_resources` command-line option. This PR exposes that to the `pip install` via an environment variable. I also went through the setup.py and documented all the environment variables I could find.	2021-12-02 16:39:36 -08:00
architkulkarni	765e8d8d53	[Serve] [Doc] fix custom metric link in serve doc (#20775 )	2021-12-01 16:39:22 -07:00
Siyuan (Ryans) Zhuang	3eb76466a0	[workflow] workflow.wait() feature (#20163 ) This PR implements `workflow.wait()`. When combined with checkpointing, it allows skipping sync & checkpointing of unfinished workflows.	2021-11-30 12:30:28 -08:00
Jiao	efbb815402	[serve] Add missing deployment calls in doc (#20778 ) Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-11-30 10:47:38 -07:00
Clark Zinzow	b872fdaaac	[Datasets] Last-mile preprocessing docs. (#20712 ) Datasets docs for last-mile preprocessing, particularly geared towards ML ingest. This gives groupby, aggregations, and random shuffling examples in the overview page (not present previously), adds some concreteness to our last-mile preprocessing positioning, and provides some preprocessing recipes for a few common transformations.	2021-11-29 23:23:27 -08:00
Amog Kamsetty	c03b937b95	[Train] Minor migration guide update (#20683 ) * update docs * tf	2021-11-29 12:42:28 -08:00
mwtian	a4d3898159	[Core][Pubsub][Logging 1/n] add logging support to GCS pubsub in Python (#20604 ) This PR adds support for publishing and subscribing to logs in Python via GCS pubsub. It also refactors the Python threaded subscriber to support subscribing and calling `close()` from multiple threads. We can also move tests and logging support to another PR, but it will make the purpose of the refactoring seems less obvious.	2021-11-29 11:26:01 -08:00
Sven Mika	e37afe0425	[RLlib; Docs] Auto API reference pages overhaul: `rllib/policy` and `rllib/agents` packages. (#20537 )	2021-11-25 09:35:19 +01:00
Yi Cheng	e24cee80e8	[docs] add dask compatibility for 1.9.0 (#20707 )	2021-11-24 15:00:17 -08:00
Guyang Song	53630ee03b	Revert "Revert "[runtime env] redefine runtime env to protobuf"" and fix windows compiling (#20692 ) - Fix windows compiling and revert https://github.com/ray-project/ray/pull/20641 - Seems the pr https://github.com/ray-project/ray/pull/20670 can solve the windows compiling issue.	2021-11-24 09:01:01 -08:00
Eric Liang	163620ba94	[data] Make block splitting feature flagged off by default (#20660 ) block splitting and makes it off by default. This makes it easier to debug problems potentially related to this feature. Criteria for enabling by default: - We're confident all nightly tests pass (currently, there may be an issue with large-scale groupby with block splitting). - We're confident lineage-based reconstruction can work with block splitting.	2021-11-23 19:46:18 -08:00
Ameer Haj Ali	e3e9697bea	[docs] autoscaler/K8s hiring roles (#20621 ) * we are hiring * fixes as philipp requested	2021-11-23 14:56:22 -08:00
Jules S. Damji	5d920fb1ee	[docs][job submission] Fixed minor editorial nits (#20654 )	2021-11-22 22:06:31 -06:00
Alex Wu	9388d28233	Revert "[runtime env] redefine runtime env to protobuf" (#20641 ) Reverts #19511 Breaks windows compilation	2021-11-22 13:11:30 -08:00
Kai Fricke	236951ee4c	[tune] Introduce TrialCheckpoint class, making checkpoint down/upload easie (#20585 ) This PR introduces a TrialCheckpoint class which is returned e.g. by ExperimentAnalysis.best_checkpoint. The class enables easy access to cloud storage locations (rather than just local directories before). It also comes with utilities to download, upload, and save trial checkpoints to local and cloud targets.	2021-11-22 14:16:26 +00:00
matthewdeng	caa4ff3783	[train][datasets] update example and remove dask (#20592 )	2021-11-21 17:06:44 -08:00
Guyang Song	ad56b9b432	[runtime env] redefine runtime env to protobuf (#19511 )	2021-11-20 16:54:42 +08:00
Jiao	12c11894e8	[Jobs] Add documentation for ray job submission (#20530 )	2021-11-19 16:59:05 -08:00
architkulkarni	42085fd3d5	[runtime env] [Doc] Add concepts and basic workflows (#20222 ) Address followup comments from https://github.com/ray-project/ray/pull/19863 - Add short "Concepts" section - Add more section headings to break up the text - Add "Workflow: Local Files" example - Add "Workflow: Library development" example	2021-11-19 13:58:50 -08:00

1 2 3 4 5 ...

1793 commits