hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
shrekris-anyscale	f9d5d6df12	[Serve] [Docs] Revise Java API documentation (#27831 )	2022-08-12 17:09:40 -07:00
shrekris-anyscale	0a3c1de08b	[Serve] [Docs] Replace references to `dag.execute()` with `handle.predict.remote()` (#27784 )	2022-08-12 17:09:28 -07:00
zcin	8cb09a9fc5	Revert "Revert "[serve] Integrate and Document Bring-Your-Own Gradio Applications"" (#27662 )	2022-08-12 15:12:20 -07:00
Balaji Veeramani	55e57d4f92	[AIR] [Docs] Revise "Which preprocessor should you use?" (#27835 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-12 14:43:36 -07:00
zcin	fa37ddc584	[Serve][docs] Add type annotations to code samples (#27795 )	2022-08-12 13:41:08 -07:00
Simon Mo	192d92bb77	[Serve] [Doc] Update intro page (#27735 )	2022-08-12 11:37:18 -07:00
Archit Kulkarni	ba36365c32	[Doc] [Serve] Fix LINT by fixing outdated Ray Client doc link (#27826 )	2022-08-12 11:35:15 -07:00
Clark Zinzow	f0404e00cd	[Core] [Hotfix] Change "task failed with unretryable exception" log statement to debug-level. (#27714 ) Serve relies on being able to do quiet application-level retries, and this info-level logging is resulting in log spam hitting users. This PR demotes this log statement to debug-level to prevent this log spam. Co-authored-by: simon-mo <simon.mo@hey.com>	2022-08-12 11:28:49 -07:00
Cheng Su	7c7828f818	[Datasets] Improve size estimation of image folder data source (#27219 ) This PR is to improve in-memory data size estimation of image folder data source. Before this PR, we use on-disk file size as estimation of in-memory data size of image folder data source. This can be inaccurate due to image compression and in-memory image resizing. Given `size` and `mode` is set to be optional in https://github.com/ray-project/ray/pull/27295, so change this PR to tackle the simple case when `size` and `mode` are both provided. * `size` and `mode` is provided: just calculate the in-memory size based on the dimensions, not need to read any image (this PR) * `size` or `mode` is not provided: need sampling to determine the in-memory size (will do in another followup PR). Here is example of estiamted size for our test image folder ``` >>> import ray >>> from ray.data.datasource.image_folder_datasource import ImageFolderDatasource >>> root = "example://image-folders/different-sizes" >>> ds = ray.data.read_datasource(ImageFolderDatasource(), root=root, size=(64, 64), mode="RGB") >>> ds.size_bytes() 40310 >>> ds.fully_executed().size_bytes() 37428 ``` Without this PR: ``` >>> ds.size_bytes() 18978 ```	2022-08-12 11:26:03 -07:00
shrekris-anyscale	6946cb38b3	[Serve] [Docs] Revise "Serving Ray AIR Checkpoints" Header (#27824 )	2022-08-12 11:18:33 -07:00
matthewdeng	58495fe594	[data][docs] fix broken links (#27818 )	2022-08-12 11:17:34 -07:00
Alan Guo	be92dd72d5	[Dashboard] Fix edge cases for log file names in the dashboard log viewer (#27772 )	2022-08-12 09:39:54 -07:00
Archit Kulkarni	6c45625d6d	[runtime env] [CI] Skip flaky test_runtime_env_working_dir_2 tests on mac (#27799 )	2022-08-12 09:39:19 -07:00
Archit Kulkarni	518c74020c	[Serve] [Doc] Serve add API ref for Deployment.bind() and serve.build (#27811 )	2022-08-12 09:38:58 -07:00
Simon Mo	bf9f0621b9	[Serve] Minor fix to replica shutdown (#27778 )	2022-08-12 09:33:08 -07:00
shrekris-anyscale	cdf25908f7	[Serve] [Docs] Document Serve benchmarks (#27711 )	2022-08-12 11:14:05 -05:00
liuyang-my	6b886d394c	[Serve] Java documentation (#26321 )	2022-08-12 09:07:12 -07:00
Simon Mo	0badbb8b1e	[Serve][docs] Refresh http-guide (#27779 ) - Moved most code snippet to doc_code - Added section about DAGDriver - Added section discussing when should you use each abstraction layer.	2022-08-12 11:06:36 -05:00
Archit Kulkarni	92e315f970	[serve][docs] Add dev workflow page (#27746 ) Adds a page describing a development workflow for Serve applications. Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com> Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>	2022-08-12 11:06:13 -05:00
shrekris-anyscale	e15960ed7e	[Serve] [Docs] Update the "Monitoring Ray Serve" Page (#27777 ) The "Monitoring Ray Serve" page explains how to inspect your Ray Serve applications. This change updates the page to remove outdated metrics that Serve no longer exposes and to upgrade code samples to use 2.0 APIs. It also improves the content's readability and organization. Link to updated "Monitoring Ray Serve" page: https://ray--27777.org.readthedocs.build/en/27777/serve/monitoring.html	2022-08-12 11:05:31 -05:00
Simon Mo	4be232e413	[Serve][Doc] Rewrite the ServeHandle page (#27775 )	2022-08-12 09:05:09 -07:00
matthewdeng	75d13faa50	[serve] fix grammar check in test (#27819 )	2022-08-12 09:02:31 -07:00
Eric Liang	52f7b89865	[docs] Editing pass on clusters docs, removing legacy material and fixing style issues (#27816 )	2022-08-12 00:15:03 -07:00
matthewdeng	9a0c1f5e0a	[data] update datasets API structure (#27592 ) Refactor Datasets API docs for easier navigation: [Ray Datasets API](https://ray--27592.org.readthedocs.build/en/27592/data/api/api.html) ### Changes 1. Create a new Datasets API base page. 2. Split existing APIs into separate pages. 3. Split `Dataset` and `DatasetPipeline` methods into separate sections. 1. Used `autosummary` to generate overview tables at the top of each of these pages. Open to other suggestions e.g. moving the summary to the top of each section instead. 2. Note: Every time we add a new method we need to explicitly add it here as well. 4. Add Input/Output APIs. 1. I chose to split these primarily by data format rather than type, since it's easier to navigate, and the existing [Creating Datasets](https://docs.ray.io/en/master/data/creating-datasets.html) User Guide already does the latter. 6. Add `Block` and `DataBatch` (should we add these aliases?) 7. Remove existing `package-ref`.	2022-08-11 23:10:10 -07:00
Nikita Vemuri	87dd078e1e	fix external dashboard url if connecting to existing cluster (#27807 ) Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com>	2022-08-11 17:56:24 -07:00
Jian Xiao	b1cad0a112	[Datasets] Use detached lifetime for stats actor (#25271 ) The actor handle held at Ray client will become dangling if the Ray cluster is shutdown, and in such case if the user tries to get the actor again it will result in crash. This happened in a real user and blocked them from making progress. This change makes the stats actor detached, and instead of keeping a handle, we access it via its name. This way we can make sure re-create this actor if the cluster gets restarted. Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>	2022-08-11 17:47:13 -07:00
Cade Daniel	b7a6a1294a	Fix linkcheck introduced by Ray Clusters doc changes (#27804 ) Broken links introduced by #27756 Will defer to @ericl if he wants to merge this or fix it himself. Signed-off-by: Cade Daniel <cade@anyscale.com>	2022-08-11 16:55:20 -07:00
Chris K. W	74f28f9270	[client] Fix ignore_reinit_error behavior in ray client (#26165 ) Ray client currently errors on reinit even if ignore_reinit_error is set.	2022-08-11 14:56:54 -07:00
Archit Kulkarni	d3514273a4	[Serve] [Doc] Refactor into new Scaling user guide, update code (#27650 )	2022-08-11 16:34:07 -05:00
shrekris-anyscale	8a6d2db1d3	[Serve] Fix grammar in deployment logs (#27780 )	2022-08-11 13:51:42 -07:00
Simon Mo	2fbfc87f5c	[Serve] Update AIR Examples to use new API, add linked guide (#27733 )	2022-08-11 13:01:17 -07:00
Simon Mo	824c1d80dd	[Serve][Doc] Add Batching User Guide (#27731 ) Add a new page discussing how to use the batching decorator.	2022-08-11 14:06:27 -05:00
Jian Xiao	5a18b1fc45	Spread the actors in data ingest benchmark, which 2x the throughput (#27620 ) The consuming actors were not spread and this PR fixed it, which improved throughput by 2x.	2022-08-11 11:47:54 -07:00
kourosh hakhamaneshi	5520a96ce0	[RLlib] Fix `get_init_state` annotation in torch and define more specific `TensorType`. (#27791 )	2022-08-11 20:02:17 +02:00
Artur Niederfahrenhorst	310ccdf5a3	[RLlib] Fix SAC config parameter that is not used. (#27741 )	2022-08-11 18:57:55 +02:00
shrekris-anyscale	314e6ae196	[Serve] [Docs] Trim Ray Serve's "Getting Started" Page (#27670 )	2022-08-11 11:49:17 -05:00
Sihan Wang	786c7f45cf	[Serve][Doc] Update the doc code to use new api (#27689 ) Co-authored-by: Archit Kulkarni <architkulkarni@users.noreply.github.com>	2022-08-11 11:24:17 -05:00
Ricky Xu	5ea4747448	[Core][State Observability] Nightly release test for state API (#26610 ) * Initial * Correctness test skeleton * Added limit for listing * Updated grpc config * no more waiting * metrics * Updated constant and add test * renamed * actors * actors * actors * dada * actor dead? * Script * correct test name * limit * Added timeout * release test /2 * Merged * format+doc * wip Signed-off-by: rickyyx <ricky@anyscale.com> * revert packag-lock Signed-off-by: rickyyx <rickyx@anyscale.com> * wip * results Signed-off-by: rickyx <rickyx@anyscale.com> Signed-off-by: rickyyx <rickyx@anyscale.com> Signed-off-by: rickyyx <ricky@anyscale.com> Signed-off-by: rickyx <rickyx@anyscale.com> Co-authored-by: rickyyx <ricky@anyscale.com>	2022-08-11 07:01:01 -07:00
Artur Niederfahrenhorst	0dceddb912	[RLlib] Move learning_starts logic from buffers into `training_step()`. (#26032 )	2022-08-11 13:07:30 +02:00
Artur Niederfahrenhorst	c855469845	[RLlib] pin gym-minigrid @ 1.0.3 (#27761 )	2022-08-11 12:27:44 +02:00
Rohan Potdar	600b8d4729	[RLlib]: Fix OPE docs. (#27460 )	2022-08-11 09:14:22 +02:00
Artur Niederfahrenhorst	894e19f791	[RLlib] Dreamer's Episodic buffer should abide by ReplayBuffer API. (#27424 )	2022-08-11 09:13:55 +02:00
matthewdeng	178b1e8a25	[data] enable test_split.py tests (#27150 ) Signed-off-by: Matthew Deng <matt@anyscale.com>	2022-08-10 22:15:34 -07:00
Stephanie Wang	043eac06ac	[docs] Revamp clusters section on job submission (#27756 ) Page structure changes: Deploying a Ray Cluster on Kubernetes Getting Started -> links to jobs Deploying a Ray Cluster on VMs Getting started -> links to jobs User Guides Autoscaling (moved more content here in favor of the Getting started page) Running Applications on Ray Clusters Ray Jobs Quickstart Using the Ray Jobs CLI Python SDK REST API Ray Job Submission API Reference Ray Client Content changes: modified "Deploying a Ray Cluster ..." quickstart pages to briefly summarize ad-hoc command execution, then link to jobs modified Ray Jobs example to be more incremental - start with a simple example, then show long-running script, then show example with a runtime env, instead of all of them at once center Ray Jobs quickstart around using the CLI. Made some minor changes to the Python SDK page to match it remove "Ray Jobs Architecture" moved "Autoscaling" content away from Kubernetes "Getting started" page into its own user guide. I think it's too complicated for "Getting Started". No content cuts. Cut "Viewing the dashboard" and "Ray Client" from Kubernetes "Getting started" page. Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>	2022-08-10 20:15:55 -07:00
zcin	6776ebe5d6	[serve][docs] Document lightweight config updates (#27706 ) A new feature was recently added, where Serve replicas are not restarted if only `num_replicas`, `autoscaling_config`, and/or `user_config` is updated in the config file that's redeployed. Updating docs to talk about this feature. Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>	2022-08-10 21:01:16 -05:00
Yi Cheng	c5952f2163	[serve] Add an internal os env to turn the head node pin off (#27763 ) When the node id of the controller died, GSC will try to reschedule the controller to the same node. But GCS will only mark the node as failure after 120s when GCS restarts (or 30s if only raylet died). This PR fixed it by unpin it to the head node. So as long as GCS is alive, it'll reschedule it immediately. But we can't turn it on by default, so we introduce an internal flag for this.	2022-08-10 18:13:54 -07:00
matthewdeng	8eca6ae852	[rllib][release] mark long_running_many_ppo as unstable (#26874 ) Per #26718 (comment)	2022-08-10 17:58:33 -07:00
Jiajun Yao	27e38f81bd	Pin _StatsActor to the driver node (#27765 ) Similar to what's done in #23397 This allows the actor to fate-share with the driver and tolerate worker node failures.	2022-08-10 17:55:06 -07:00
Chen Shen	ddca52d2ca	[cluster doc] Promote new doc and deprecate the old (#27759 ) Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-10 17:41:56 -07:00
Balaji Veeramani	7da7dbe3fd	[AIR] Improve preprocessor documentation (#27215 ) Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-10 17:13:22 -07:00

... 2 3 4 5 6 ...

14122 commits