Commit graph

2580 commits

Author SHA1 Message Date
Dmitri Gekhtman
59be31d558
Update links. (#28269)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

This PR updates the quickstart configuration in the Ray docs to reflect the fixes from
ray-project/kuberay#529

To provide access to the fixed version, we update the link to point to KubeRay master rather than the 0.3.0 branch.
After the next KubeRay release (0.4.0), we can update these links to point to a fixed release version again.
2022-09-02 12:18:04 -07:00
Kai Fricke
57484b28cf
[ci/air] Only run examples that need credentials in branch builds (#28260) 2022-09-02 09:10:36 -07:00
kourosh hakhamaneshi
5779ee764d
[RLlib] Fix ope v_gain (#28136) 2022-09-02 08:27:05 -07:00
Amog Kamsetty
b83f10dbde
[Docs] [Train] Update Train API reference and docs (#28192)
Signed-off-by: Amog Kamsetty amogkamsetty@yahoo.com

Adds back more Ray Train APIs to Ray Train docs.

Also makes updates to the user guide for better references.
2022-09-01 17:47:42 -07:00
Yi Cheng
d0b879cdb1
[workflow] Change name in step to task_id (#28151)
We've deprecated the name options and use task_id. This is the cleanup to fix everything left.
2022-08-31 20:27:32 -07:00
shrekris-anyscale
f747415d80
[Serve] [Doc] Restore documentation about host and port in Serve config (#28219) 2022-08-31 20:27:00 -07:00
Justin Yu
5cec2492bb
Fix tune resources example code (#28210)
The tune resources user guide contained broken code snippets. This PR fixes those, adds some extra clarifying comments, and improves the code style for readability.

Signed-off-by: Justin Yu <justinvyu@berkeley.edu>
2022-08-31 14:48:41 -07:00
Artur Niederfahrenhorst
f420407b0d
[ML] Pin Pydantic <= 1.9.2 (#28205)
CI is red because of a dependency issue around dataclass_transform .

Signed-off-by: Artur Niederfahrenhorst <artur@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-08-31 13:35:18 -07:00
Antoni Baum
8a30606308
[AIR][Docs] Improve Hugging Face notebook example (#28121)
Improves the HF notebook by making use of preprocessors and adding a section on tuning. Brings it in line with the Ray Summit 2022 demo.

Signed-off-by: Antoni Baum antoni.baum@protonmail.com
2022-08-30 12:36:41 -07:00
shrekris-anyscale
a15442a510
[docs] Omit bash prompt (#28028)
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
2022-08-29 14:10:02 -07:00
Kilian Lieret
328e6ac2f4
Slurm: Set load_env to empty string if not specified (#28132) 2022-08-27 20:00:35 -07:00
Kai Fricke
bbd13ddc33
[air/docs] Add example to fetch results dataframe for trainer/tuner (#28067) 2022-08-27 02:01:57 -07:00
Jiajun Yao
c8617b9ebf
[Doc] Revamp ray core design patterns doc [3/n]: ray get in a loop (#28113)
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-26 20:41:04 -07:00
Amog Kamsetty
00f6273775
[Docs] [Tune] ResultGrid Docs and API reference (#28068)
Improve docstring for ResultGrid and show API reference and docstring in Tune API section.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-26 16:50:35 -07:00
Guilherme
6cf363af0d
Updates limit-tasks example (#26644)
The example fails because it can assign an invalid value to the num_returns parameters in ray.wait function
2022-08-26 15:19:58 -07:00
Kai Fricke
d0678b80ed
[rfc] [air/tune/train] Improve trial/training failure error printing (#27946)
When training fails, the console output is currently cluttered with tracebacks which are hard to digest. This problem is exacerbated when running multiple trials in a tuning run.

The main problems here are:

1. Tracebacks are printed multiple times: In the remote worker and on the driver
2. Tracebacks include many internal wrappers

The proposed solution for 1 is to only print tracebacks once (on the driver) or never (if configured).

The proposed solution for 2 is to shorten the tracebacks to include mostly user-provided code.

### Deduplicating traceback printing

The solution here is to use `logger.error` instead of `logger.exception` in the `function_trainable.py` to avoid printing a traceback in the trainable. 

Additionally, we introduce an environment variable `TUNE_PRINT_ALL_TRIAL_ERRORS` which defaults to 1. If set to 0, trial errors will not be printed at all in the console (only the error.txt files will exist).

To be discussed: We could also default this to 0, but I think the expectation is to see at least some failure output in the console logs per default.

### Removing internal wrappers from tracebacks

The solution here is to introcude a magic local variable `_ray_start_tb`. In two places, we use this magic local variable to reduce the stacktrace. A utility `shorten_tb` looks for the last occurence of `_ray_start_tb` in the stacktrace and starts the traceback from there. This takes only linear time. If the magic variable is not present, the full traceback is returned - this means that if the error does not come up in user code, the full traceback is returned, giving visibility in possible internal bugs. Additionally there is an env variable `RAY_AIR_FULL_TRACEBACKS` which disables traceback shortening.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-26 15:02:38 -07:00
Antoni Baum
ea483ecf7a
[AIR][Docs] Clarify how LGBM/XGB trainers work (#28122) 2022-08-26 14:51:22 -07:00
Kai Fricke
3b3aa80ba3
[tune/ci] Fix link to SigOpt experiment API (#28127)
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-26 14:10:53 -07:00
Dmitri Gekhtman
ce99cf1b71
[Docs][Kubernetes] Fix link, add a bit of content (#28017)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

Fixes the "legacy operator" link to point to master, rather than the 2.0.0 branch. The migration README exists in master but not in the 2.0.0 branch.
Adds a sentence explaining that the Ray container has to go first in the container list.
Adds a sentence to config guide mention min/max replicas and linking to autoscaling.
Documents a bug related to GPU auto-detection in KubeRay 0.3.0.
2022-08-26 12:02:18 -07:00
Akash Patel
96d579a4fe
Add support for Python 3.10 (#21221)
Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>
2022-08-26 11:01:12 -07:00
Amin Allahyar
455fa664e5
Minor update on the key concept explanation (#28032) 2022-08-26 10:57:58 -07:00
Dmitri Gekhtman
e98fdef93e
Move cloudwatch. (#28041)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

For a more balanced table of contents, makes CloudWatch instructions a subsection of AWS instructions.
2022-08-26 08:55:38 -07:00
Jiajun Yao
5139a5c722
Fix broken gym library link (#28111)
gymlibrary.ml becomes gymlibrary.dev

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-25 19:52:43 -07:00
Max Pumperla
50cb51387e
fixes #25860 (#28097) 2022-08-25 10:45:35 -07:00
Kai Fricke
e0725d1f1d
[docs/ci] Fix (some) broken linkchecks (#28087)
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-25 04:41:35 -07:00
Max Pumperla
ec3c7f855e
[docs] add algolia crawler verification (#28094) 2022-08-25 01:36:26 -07:00
Cade Daniel
5fb36d4a7d
Small fixes to job submission cluster docs (#28056)
I walked through the new job submission cluster docs and sanded down a few rough edges.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-08-23 09:41:45 -07:00
Eric Liang
ad40e19ca0
[docs] Add the AIR technical whitepaper to our docs (#28053) 2022-08-22 16:41:51 -07:00
shrekris-anyscale
ded324d6a4
[Docs] Remove topbar overlap on left table of contents (#28031) 2022-08-20 02:02:16 -07:00
Jun Gong
62b91cbec0
[docs][rllib] Documentation for connectors. (#27528)
Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-19 14:35:07 -07:00
Richard Liaw
71efee04f6
[air, clusters/docs] add images to air docs and reformat clusters panels (#28011) 2022-08-19 08:52:47 -07:00
Cade Daniel
a6b7189ab3
Fixing formatting around TODO that found its way into compiled docs. (#28001)
Signed-off-by: Cade Daniel <cade@anyscale.com>

Fixing formatting around TODO that found its way into compiled docs.
2022-08-18 17:46:39 -07:00
shrekris-anyscale
4395f8792f
[Serve] [Docs] Fix link in Serve Config Files documentation (#27993) 2022-08-18 14:50:23 -07:00
SangBin Cho
9950e9c1f4
[Doc] CLI Reference Documentation Revamp (#27862)
Take out the CLI reference from the core API subsection. It follows the same CLI reference pattern as other library (e.g., Serve has Serve CLI under Serve API section).
2022-08-18 14:29:31 -07:00
Dmitri Gekhtman
c2ead88aca
[kuberay][docs] Experimental features (#27898) 2022-08-18 11:37:06 -07:00
Dmitri Gekhtman
98c90b8488
[clusters][docs] Provide urls to content, fix typos (#27936) 2022-08-18 11:33:04 -07:00
Dmitri Gekhtman
6cf263838f
[docs][touch-up] Add ephemeral storage to Ray-on-K8s example. (#27916) 2022-08-18 11:29:55 -07:00
Sihan Wang
112f104fb6
[Serve][Doc] Fix user guide tables (#27991) 2022-08-18 10:55:31 -07:00
Eric Liang
47f3d83379
[docs] Minor AIR figure updates (#27965) 2022-08-18 10:30:24 -07:00
Jiajun Yao
0a3a5e68a4
Revamp ray core design patterns doc [2/n]: too fine grained tasks (#27919)
Move the code to doc_code
Fix the code example to make batching faster than serial run.

Related issue number

#27048

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-17 13:52:50 -07:00
Edward Oakes
a400cde56f
[docs][serve] Trim down user guide & clean up table of contents (#27926)
An attempt at making the docs shorter and sweeter including various small cleanup items.

- Reorder the TOC on the sidebar for the user guides to be more linear based on a user's journey.
- Put the batching content under the performance guide.
- Remove the AIR guide (AIR users already have a serving guide).
- Combine the `ServeHandle` and model composition pages into a single guide. We may want to revisit this in the future but for now better to have it in a single place instead of duplicated (with links going to both).
- Fix the index page for the user guides to match the TOC sidebar.
- Rename a few pages for clarity & consistency.
- Remove some now-redundant content (old ML models user guide).
2022-08-17 13:24:17 -05:00
Kai Fricke
4a55f18a22
[docs][serve] Fix linkcheck for production guide (#27941) 2022-08-17 07:46:53 -07:00
Cheng Su
4ad1b4c712
Fix nyc_taxi_basic_processing.ipynb end-to-end (#27927)
Signed-off-by: Cheng Su <scnju13@gmail.com>
This is to run ray 2.0.0rc0 on https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html and fix the notebook end-to-end, make sure the output and wording is matched.

The page after this PR - https://ray--27927.org.readthedocs.build/en/27927/data/examples/nyc_taxi_basic_processing.html .
2022-08-16 21:30:19 -07:00
Christy Bergman
3f313d74ad
Replace robot image with emoji and replace word Trainer with Algorithm (#27928) 2022-08-16 21:27:21 -07:00
Edward Oakes
65f92a44e3
[serve][docs] Consolidate production guides, add kuberay docs to it (#27747)
- Adds KubeRay information to the production guide.
- Consolidates the two user guides we had related to production deployment.
- Adds information about experimental GCS HA feature.
2022-08-16 21:29:56 -05:00
Yi Cheng
2262ac02f3
[workflow][doc] First pass of workflow doc. (#27331)
Signed-off-by: Yi Cheng 74173148+iycheng@users.noreply.github.com

Why are these changes needed?
This PR update workflow doc to reflect the recent change.
Focusing on position change and others.
2022-08-16 18:48:05 -07:00
Antoni Baum
7ff914b06e
[AIR][Docs] Set logging_strategy="epoch" for HF (#27917) 2022-08-16 16:45:46 -07:00
Eric Liang
8a7be15b72
[docs] Simplify Ray start guide and move PI tutorial to examples page (#27885) 2022-08-16 14:28:45 -07:00
Richard Liaw
759fbd9502
[air][minor] Use drop_columns in docs (#27852) 2022-08-16 14:01:25 -07:00
Zoltan Fedor
78648e3583
[Serve][Docs] Mark metrics served for HTTP vs Python calls (#27858)
Different metrics are collected in Ray Serve when the deployments are called from HTTP vs Python. This needs to be mentioned in the documentation and each metric marked accordingly.
2022-08-16 15:23:29 -05:00