hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-05 10:01:43 -05:00

Author	SHA1	Message	Date
Yi Cheng	d0b879cdb1	[workflow] Change name in step to task_id (#28151 ) We've deprecated the name options and use task_id. This is the cleanup to fix everything left.	2022-08-31 20:27:32 -07:00
shrekris-anyscale	f747415d80	[Serve] [Doc] Restore documentation about host and port in Serve config (#28219 )	2022-08-31 20:27:00 -07:00
Justin Yu	5cec2492bb	Fix tune resources example code (#28210 ) The tune resources user guide contained broken code snippets. This PR fixes those, adds some extra clarifying comments, and improves the code style for readability. Signed-off-by: Justin Yu <justinvyu@berkeley.edu>	2022-08-31 14:48:41 -07:00
Artur Niederfahrenhorst	f420407b0d	[ML] Pin Pydantic <= 1.9.2 (#28205 ) CI is red because of a dependency issue around dataclass_transform . Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com> Signed-off-by: Kai Fricke <kai@anyscale.com> Co-authored-by: Kai Fricke <kai@anyscale.com>	2022-08-31 13:35:18 -07:00
Antoni Baum	8a30606308	[AIR][Docs] Improve Hugging Face notebook example (#28121 ) Improves the HF notebook by making use of preprocessors and adding a section on tuning. Brings it in line with the Ray Summit 2022 demo. Signed-off-by: Antoni Baum antoni.baum@protonmail.com	2022-08-30 12:36:41 -07:00
shrekris-anyscale	a15442a510	[docs] Omit bash prompt (#28028 ) Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com> Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>	2022-08-29 14:10:02 -07:00
Kilian Lieret	328e6ac2f4	Slurm: Set load_env to empty string if not specified (#28132 )	2022-08-27 20:00:35 -07:00
Kai Fricke	bbd13ddc33	[air/docs] Add example to fetch results dataframe for trainer/tuner (#28067 )	2022-08-27 02:01:57 -07:00
Jiajun Yao	c8617b9ebf	[Doc] Revamp ray core design patterns doc [3/n]: ray get in a loop (#28113 ) Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-26 20:41:04 -07:00
Amog Kamsetty	00f6273775	[Docs] [Tune] `ResultGrid` Docs and API reference (#28068 ) Improve docstring for ResultGrid and show API reference and docstring in Tune API section. Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com> Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-26 16:50:35 -07:00
Guilherme	6cf363af0d	Updates limit-tasks example (#26644 ) The example fails because it can assign an invalid value to the num_returns parameters in ray.wait function	2022-08-26 15:19:58 -07:00
Kai Fricke	d0678b80ed	[rfc] [air/tune/train] Improve trial/training failure error printing (#27946 ) When training fails, the console output is currently cluttered with tracebacks which are hard to digest. This problem is exacerbated when running multiple trials in a tuning run. The main problems here are: 1. Tracebacks are printed multiple times: In the remote worker and on the driver 2. Tracebacks include many internal wrappers The proposed solution for 1 is to only print tracebacks once (on the driver) or never (if configured). The proposed solution for 2 is to shorten the tracebacks to include mostly user-provided code. ### Deduplicating traceback printing The solution here is to use `logger.error` instead of `logger.exception` in the `function_trainable.py` to avoid printing a traceback in the trainable. Additionally, we introduce an environment variable `TUNE_PRINT_ALL_TRIAL_ERRORS` which defaults to 1. If set to 0, trial errors will not be printed at all in the console (only the error.txt files will exist). To be discussed: We could also default this to 0, but I think the expectation is to see at least some failure output in the console logs per default. ### Removing internal wrappers from tracebacks The solution here is to introcude a magic local variable `_ray_start_tb`. In two places, we use this magic local variable to reduce the stacktrace. A utility `shorten_tb` looks for the last occurence of `_ray_start_tb` in the stacktrace and starts the traceback from there. This takes only linear time. If the magic variable is not present, the full traceback is returned - this means that if the error does not come up in user code, the full traceback is returned, giving visibility in possible internal bugs. Additionally there is an env variable `RAY_AIR_FULL_TRACEBACKS` which disables traceback shortening. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-26 15:02:38 -07:00
Antoni Baum	ea483ecf7a	[AIR][Docs] Clarify how LGBM/XGB trainers work (#28122 )	2022-08-26 14:51:22 -07:00
Kai Fricke	3b3aa80ba3	[tune/ci] Fix link to SigOpt experiment API (#28127 ) Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-26 14:10:53 -07:00
Dmitri Gekhtman	ce99cf1b71	[Docs][Kubernetes] Fix link, add a bit of content (#28017 ) Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com> Fixes the "legacy operator" link to point to master, rather than the 2.0.0 branch. The migration README exists in master but not in the 2.0.0 branch. Adds a sentence explaining that the Ray container has to go first in the container list. Adds a sentence to config guide mention min/max replicas and linking to autoscaling. Documents a bug related to GPU auto-detection in KubeRay 0.3.0.	2022-08-26 12:02:18 -07:00
Akash Patel	96d579a4fe	Add support for Python 3.10 (#21221 ) Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>	2022-08-26 11:01:12 -07:00
Amin Allahyar	455fa664e5	Minor update on the key concept explanation (#28032 )	2022-08-26 10:57:58 -07:00
Dmitri Gekhtman	e98fdef93e	Move cloudwatch. (#28041 ) Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com> For a more balanced table of contents, makes CloudWatch instructions a subsection of AWS instructions.	2022-08-26 08:55:38 -07:00
Jiajun Yao	5139a5c722	Fix broken gym library link (#28111 ) gymlibrary.ml becomes gymlibrary.dev Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-25 19:52:43 -07:00
Max Pumperla	50cb51387e	fixes #25860 (#28097 )	2022-08-25 10:45:35 -07:00
Kai Fricke	e0725d1f1d	[docs/ci] Fix (some) broken linkchecks (#28087 ) Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-25 04:41:35 -07:00
Max Pumperla	ec3c7f855e	[docs] add algolia crawler verification (#28094 )	2022-08-25 01:36:26 -07:00
Cade Daniel	5fb36d4a7d	Small fixes to job submission cluster docs (#28056 ) I walked through the new job submission cluster docs and sanded down a few rough edges. Signed-off-by: Cade Daniel <cade@anyscale.com>	2022-08-23 09:41:45 -07:00
Eric Liang	ad40e19ca0	[docs] Add the AIR technical whitepaper to our docs (#28053 )	2022-08-22 16:41:51 -07:00
shrekris-anyscale	ded324d6a4	[Docs] Remove topbar overlap on left table of contents (#28031 )	2022-08-20 02:02:16 -07:00
Jun Gong	62b91cbec0	[docs][rllib] Documentation for connectors. (#27528 ) Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-19 14:35:07 -07:00
Richard Liaw	71efee04f6	[air, clusters/docs] add images to air docs and reformat clusters panels (#28011 )	2022-08-19 08:52:47 -07:00
Cade Daniel	a6b7189ab3	Fixing formatting around TODO that found its way into compiled docs. (#28001 ) Signed-off-by: Cade Daniel <cade@anyscale.com> Fixing formatting around TODO that found its way into compiled docs.	2022-08-18 17:46:39 -07:00
shrekris-anyscale	4395f8792f	[Serve] [Docs] Fix link in Serve Config Files documentation (#27993 )	2022-08-18 14:50:23 -07:00
SangBin Cho	9950e9c1f4	[Doc] CLI Reference Documentation Revamp (#27862 ) Take out the CLI reference from the core API subsection. It follows the same CLI reference pattern as other library (e.g., Serve has Serve CLI under Serve API section).	2022-08-18 14:29:31 -07:00
Dmitri Gekhtman	c2ead88aca	[kuberay][docs] Experimental features (#27898 )	2022-08-18 11:37:06 -07:00
Dmitri Gekhtman	98c90b8488	[clusters][docs] Provide urls to content, fix typos (#27936 )	2022-08-18 11:33:04 -07:00
Dmitri Gekhtman	6cf263838f	[docs][touch-up] Add ephemeral storage to Ray-on-K8s example. (#27916 )	2022-08-18 11:29:55 -07:00
Sihan Wang	112f104fb6	[Serve][Doc] Fix user guide tables (#27991 )	2022-08-18 10:55:31 -07:00
Eric Liang	47f3d83379	[docs] Minor AIR figure updates (#27965 )	2022-08-18 10:30:24 -07:00
Jiajun Yao	0a3a5e68a4	Revamp ray core design patterns doc [2/n]: too fine grained tasks (#27919 ) Move the code to doc_code Fix the code example to make batching faster than serial run. Related issue number #27048 Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>	2022-08-17 13:52:50 -07:00
Edward Oakes	a400cde56f	[docs][serve] Trim down user guide & clean up table of contents (#27926 ) An attempt at making the docs shorter and sweeter including various small cleanup items. - Reorder the TOC on the sidebar for the user guides to be more linear based on a user's journey. - Put the batching content under the performance guide. - Remove the AIR guide (AIR users already have a serving guide). - Combine the `ServeHandle` and model composition pages into a single guide. We may want to revisit this in the future but for now better to have it in a single place instead of duplicated (with links going to both). - Fix the index page for the user guides to match the TOC sidebar. - Rename a few pages for clarity & consistency. - Remove some now-redundant content (old ML models user guide).	2022-08-17 13:24:17 -05:00
Kai Fricke	4a55f18a22	[docs][serve] Fix linkcheck for production guide (#27941 )	2022-08-17 07:46:53 -07:00
Cheng Su	4ad1b4c712	Fix nyc_taxi_basic_processing.ipynb end-to-end (#27927 ) Signed-off-by: Cheng Su <scnju13@gmail.com> This is to run ray 2.0.0rc0 on https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html and fix the notebook end-to-end, make sure the output and wording is matched. The page after this PR - https://ray--27927.org.readthedocs.build/en/27927/data/examples/nyc_taxi_basic_processing.html .	2022-08-16 21:30:19 -07:00
Christy Bergman	3f313d74ad	Replace robot image with emoji and replace word Trainer with Algorithm (#27928 )	2022-08-16 21:27:21 -07:00
Edward Oakes	65f92a44e3	[serve][docs] Consolidate production guides, add kuberay docs to it (#27747 ) - Adds KubeRay information to the production guide. - Consolidates the two user guides we had related to production deployment. - Adds information about experimental GCS HA feature.	2022-08-16 21:29:56 -05:00
Yi Cheng	2262ac02f3	[workflow][doc] First pass of workflow doc. (#27331 ) Signed-off-by: Yi Cheng 74173148+iycheng@users.noreply.github.com Why are these changes needed? This PR update workflow doc to reflect the recent change. Focusing on position change and others.	2022-08-16 18:48:05 -07:00
Antoni Baum	7ff914b06e	[AIR][Docs] Set `logging_strategy="epoch"` for HF (#27917 )	2022-08-16 16:45:46 -07:00
Eric Liang	8a7be15b72	[docs] Simplify Ray start guide and move PI tutorial to examples page (#27885 )	2022-08-16 14:28:45 -07:00
Richard Liaw	759fbd9502	[air][minor] Use drop_columns in docs (#27852 )	2022-08-16 14:01:25 -07:00
Zoltan Fedor	78648e3583	[Serve][Docs] Mark metrics served for HTTP vs Python calls (#27858 ) Different metrics are collected in Ray Serve when the deployments are called from HTTP vs Python. This needs to be mentioned in the documentation and each metric marked accordingly.	2022-08-16 15:23:29 -05:00
Ian Rodney	24508db920	[Docs][GCP] Configuring ServiceAccounts for worker (#27915 ) Enables better usage with GCP. The default behavior is that the head runs with the ray-autoscaler-sa-v1 service Account, but workers do not. Workers can run with this service account by copying & uncommenting L114->L117 from example-full Signed-off-by: Ian <ian.rodney@gmail.com> Co-authored-by: Richard Liaw <rliaw@berkeley.edu>	2022-08-16 13:13:27 -07:00
Kai Fricke	b91246a093	[air/benchmarks] Measure local training time in torch/tf benchmarks (#27902 ) We currently measure end-to-end training time in our benchmarks, which includes setup overhead. This is an unequal comparison, as setup overhead for vanilla training cannot be accurately expressed and was instead just disregarded. By comparing the raw training times in the actual training loop, we will get a more accurate expression of any potential overhead or benefit in using Ray vs. vanilla tensorflow/torch. Signed-off-by: Kai Fricke <kai@anyscale.com>	2022-08-16 19:16:08 +02:00
Simon Mo	b9a2fb79b6	[AIR][Docs] Remove the excessive printing from Torch examples (#27903 )	2022-08-16 09:09:54 -07:00
Dmitri Gekhtman	bceef503b2	[Kubernetes][docs] Restore legacy Ray operator migration discussion (#27841 ) This PR restores notes for migration from the legacy Ray operator to the new KubeRay operator. To avoid disrupting the flow of the Ray documentation, these notes are placed in a README accompanying the old operator's code. These notes are linked from the new docs. Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>	2022-08-16 08:46:31 -07:00

1 2 3 4 5 ...

2576 commits