Commit graph

2571 commits

Author SHA1 Message Date
shrekris-anyscale
a15442a510
[docs] Omit bash prompt (#28028)
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
2022-08-29 14:10:02 -07:00
Kilian Lieret
328e6ac2f4
Slurm: Set load_env to empty string if not specified (#28132) 2022-08-27 20:00:35 -07:00
Kai Fricke
bbd13ddc33
[air/docs] Add example to fetch results dataframe for trainer/tuner (#28067) 2022-08-27 02:01:57 -07:00
Jiajun Yao
c8617b9ebf
[Doc] Revamp ray core design patterns doc [3/n]: ray get in a loop (#28113)
Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-26 20:41:04 -07:00
Amog Kamsetty
00f6273775
[Docs] [Tune] ResultGrid Docs and API reference (#28068)
Improve docstring for ResultGrid and show API reference and docstring in Tune API section.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-26 16:50:35 -07:00
Guilherme
6cf363af0d
Updates limit-tasks example (#26644)
The example fails because it can assign an invalid value to the num_returns parameters in ray.wait function
2022-08-26 15:19:58 -07:00
Kai Fricke
d0678b80ed
[rfc] [air/tune/train] Improve trial/training failure error printing (#27946)
When training fails, the console output is currently cluttered with tracebacks which are hard to digest. This problem is exacerbated when running multiple trials in a tuning run.

The main problems here are:

1. Tracebacks are printed multiple times: In the remote worker and on the driver
2. Tracebacks include many internal wrappers

The proposed solution for 1 is to only print tracebacks once (on the driver) or never (if configured).

The proposed solution for 2 is to shorten the tracebacks to include mostly user-provided code.

### Deduplicating traceback printing

The solution here is to use `logger.error` instead of `logger.exception` in the `function_trainable.py` to avoid printing a traceback in the trainable. 

Additionally, we introduce an environment variable `TUNE_PRINT_ALL_TRIAL_ERRORS` which defaults to 1. If set to 0, trial errors will not be printed at all in the console (only the error.txt files will exist).

To be discussed: We could also default this to 0, but I think the expectation is to see at least some failure output in the console logs per default.

### Removing internal wrappers from tracebacks

The solution here is to introcude a magic local variable `_ray_start_tb`. In two places, we use this magic local variable to reduce the stacktrace. A utility `shorten_tb` looks for the last occurence of `_ray_start_tb` in the stacktrace and starts the traceback from there. This takes only linear time. If the magic variable is not present, the full traceback is returned - this means that if the error does not come up in user code, the full traceback is returned, giving visibility in possible internal bugs. Additionally there is an env variable `RAY_AIR_FULL_TRACEBACKS` which disables traceback shortening.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-26 15:02:38 -07:00
Antoni Baum
ea483ecf7a
[AIR][Docs] Clarify how LGBM/XGB trainers work (#28122) 2022-08-26 14:51:22 -07:00
Kai Fricke
3b3aa80ba3
[tune/ci] Fix link to SigOpt experiment API (#28127)
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-26 14:10:53 -07:00
Dmitri Gekhtman
ce99cf1b71
[Docs][Kubernetes] Fix link, add a bit of content (#28017)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

Fixes the "legacy operator" link to point to master, rather than the 2.0.0 branch. The migration README exists in master but not in the 2.0.0 branch.
Adds a sentence explaining that the Ray container has to go first in the container list.
Adds a sentence to config guide mention min/max replicas and linking to autoscaling.
Documents a bug related to GPU auto-detection in KubeRay 0.3.0.
2022-08-26 12:02:18 -07:00
Akash Patel
96d579a4fe
Add support for Python 3.10 (#21221)
Signed-off-by: acxz <17132214+acxz@users.noreply.github.com>
2022-08-26 11:01:12 -07:00
Amin Allahyar
455fa664e5
Minor update on the key concept explanation (#28032) 2022-08-26 10:57:58 -07:00
Dmitri Gekhtman
e98fdef93e
Move cloudwatch. (#28041)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

For a more balanced table of contents, makes CloudWatch instructions a subsection of AWS instructions.
2022-08-26 08:55:38 -07:00
Jiajun Yao
5139a5c722
Fix broken gym library link (#28111)
gymlibrary.ml becomes gymlibrary.dev

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-25 19:52:43 -07:00
Max Pumperla
50cb51387e
fixes #25860 (#28097) 2022-08-25 10:45:35 -07:00
Kai Fricke
e0725d1f1d
[docs/ci] Fix (some) broken linkchecks (#28087)
Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-25 04:41:35 -07:00
Max Pumperla
ec3c7f855e
[docs] add algolia crawler verification (#28094) 2022-08-25 01:36:26 -07:00
Cade Daniel
5fb36d4a7d
Small fixes to job submission cluster docs (#28056)
I walked through the new job submission cluster docs and sanded down a few rough edges.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-08-23 09:41:45 -07:00
Eric Liang
ad40e19ca0
[docs] Add the AIR technical whitepaper to our docs (#28053) 2022-08-22 16:41:51 -07:00
shrekris-anyscale
ded324d6a4
[Docs] Remove topbar overlap on left table of contents (#28031) 2022-08-20 02:02:16 -07:00
Jun Gong
62b91cbec0
[docs][rllib] Documentation for connectors. (#27528)
Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-19 14:35:07 -07:00
Richard Liaw
71efee04f6
[air, clusters/docs] add images to air docs and reformat clusters panels (#28011) 2022-08-19 08:52:47 -07:00
Cade Daniel
a6b7189ab3
Fixing formatting around TODO that found its way into compiled docs. (#28001)
Signed-off-by: Cade Daniel <cade@anyscale.com>

Fixing formatting around TODO that found its way into compiled docs.
2022-08-18 17:46:39 -07:00
shrekris-anyscale
4395f8792f
[Serve] [Docs] Fix link in Serve Config Files documentation (#27993) 2022-08-18 14:50:23 -07:00
SangBin Cho
9950e9c1f4
[Doc] CLI Reference Documentation Revamp (#27862)
Take out the CLI reference from the core API subsection. It follows the same CLI reference pattern as other library (e.g., Serve has Serve CLI under Serve API section).
2022-08-18 14:29:31 -07:00
Dmitri Gekhtman
c2ead88aca
[kuberay][docs] Experimental features (#27898) 2022-08-18 11:37:06 -07:00
Dmitri Gekhtman
98c90b8488
[clusters][docs] Provide urls to content, fix typos (#27936) 2022-08-18 11:33:04 -07:00
Dmitri Gekhtman
6cf263838f
[docs][touch-up] Add ephemeral storage to Ray-on-K8s example. (#27916) 2022-08-18 11:29:55 -07:00
Sihan Wang
112f104fb6
[Serve][Doc] Fix user guide tables (#27991) 2022-08-18 10:55:31 -07:00
Eric Liang
47f3d83379
[docs] Minor AIR figure updates (#27965) 2022-08-18 10:30:24 -07:00
Jiajun Yao
0a3a5e68a4
Revamp ray core design patterns doc [2/n]: too fine grained tasks (#27919)
Move the code to doc_code
Fix the code example to make batching faster than serial run.

Related issue number

#27048

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-17 13:52:50 -07:00
Edward Oakes
a400cde56f
[docs][serve] Trim down user guide & clean up table of contents (#27926)
An attempt at making the docs shorter and sweeter including various small cleanup items.

- Reorder the TOC on the sidebar for the user guides to be more linear based on a user's journey.
- Put the batching content under the performance guide.
- Remove the AIR guide (AIR users already have a serving guide).
- Combine the `ServeHandle` and model composition pages into a single guide. We may want to revisit this in the future but for now better to have it in a single place instead of duplicated (with links going to both).
- Fix the index page for the user guides to match the TOC sidebar.
- Rename a few pages for clarity & consistency.
- Remove some now-redundant content (old ML models user guide).
2022-08-17 13:24:17 -05:00
Kai Fricke
4a55f18a22
[docs][serve] Fix linkcheck for production guide (#27941) 2022-08-17 07:46:53 -07:00
Cheng Su
4ad1b4c712
Fix nyc_taxi_basic_processing.ipynb end-to-end (#27927)
Signed-off-by: Cheng Su <scnju13@gmail.com>
This is to run ray 2.0.0rc0 on https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html and fix the notebook end-to-end, make sure the output and wording is matched.

The page after this PR - https://ray--27927.org.readthedocs.build/en/27927/data/examples/nyc_taxi_basic_processing.html .
2022-08-16 21:30:19 -07:00
Christy Bergman
3f313d74ad
Replace robot image with emoji and replace word Trainer with Algorithm (#27928) 2022-08-16 21:27:21 -07:00
Edward Oakes
65f92a44e3
[serve][docs] Consolidate production guides, add kuberay docs to it (#27747)
- Adds KubeRay information to the production guide.
- Consolidates the two user guides we had related to production deployment.
- Adds information about experimental GCS HA feature.
2022-08-16 21:29:56 -05:00
Yi Cheng
2262ac02f3
[workflow][doc] First pass of workflow doc. (#27331)
Signed-off-by: Yi Cheng 74173148+iycheng@users.noreply.github.com

Why are these changes needed?
This PR update workflow doc to reflect the recent change.
Focusing on position change and others.
2022-08-16 18:48:05 -07:00
Antoni Baum
7ff914b06e
[AIR][Docs] Set logging_strategy="epoch" for HF (#27917) 2022-08-16 16:45:46 -07:00
Eric Liang
8a7be15b72
[docs] Simplify Ray start guide and move PI tutorial to examples page (#27885) 2022-08-16 14:28:45 -07:00
Richard Liaw
759fbd9502
[air][minor] Use drop_columns in docs (#27852) 2022-08-16 14:01:25 -07:00
Zoltan Fedor
78648e3583
[Serve][Docs] Mark metrics served for HTTP vs Python calls (#27858)
Different metrics are collected in Ray Serve when the deployments are called from HTTP vs Python. This needs to be mentioned in the documentation and each metric marked accordingly.
2022-08-16 15:23:29 -05:00
Ian Rodney
24508db920
[Docs][GCP] Configuring ServiceAccounts for worker (#27915)
Enables better usage with GCP.

The default behavior is that the head runs with the ray-autoscaler-sa-v1 service Account, but workers do not. Workers can run with this service account by copying & uncommenting L114->L117 from example-full


Signed-off-by: Ian <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-16 13:13:27 -07:00
Kai Fricke
b91246a093
[air/benchmarks] Measure local training time in torch/tf benchmarks (#27902)
We currently measure end-to-end training time in our benchmarks, which includes setup overhead. This is an unequal comparison, as setup overhead for vanilla training cannot be accurately expressed and was instead just disregarded.
By comparing the raw training times in the actual training loop, we will get a more accurate expression of any potential overhead or benefit in using Ray vs. vanilla tensorflow/torch.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-16 19:16:08 +02:00
Simon Mo
b9a2fb79b6
[AIR][Docs] Remove the excessive printing from Torch examples (#27903) 2022-08-16 09:09:54 -07:00
Dmitri Gekhtman
bceef503b2
[Kubernetes][docs] Restore legacy Ray operator migration discussion (#27841)
This PR restores notes for migration from the legacy Ray operator to the new KubeRay operator.

To avoid disrupting the flow of the Ray documentation, these notes are placed in a README accompanying the old operator's code.

These notes are linked from the new docs.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-16 08:46:31 -07:00
Chen Shen
f05c744a65
[Doc] minor fix on accessing AWS/S3
update the doc.
2022-08-15 16:53:31 -07:00
Yuan-Chi Chang
34c494260f
[workflow] Documentation of http events (#27166)
Documentation updates for the newly introduced HTTPEventProvider and HTTPListener in Ray 2.0.
2022-08-15 14:23:04 -07:00
Jiajun Yao
eb37bb857c
Revamp ray core design patterns doc [1/n]: generators (#27823)
- Move the code snippet to doc_code folder
- Move patterns to an upper level.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-15 09:24:34 -07:00
Myeongju Kim
52440f1489
[Docs] Fix a typo in index.md (#27859)
Signed-off-by: myeongjukim <ming3772@gmail.com>

Signed-off-by: myeongjukim <ming3772@gmail.com>
2022-08-15 08:26:40 -07:00
Cheng Su
a2c168cd6d
[Datasets][docs] Minor fix for nyc_taxi_basic_processing.ipynb (#27828)
Went through https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html, and doing some minor fix here.

Fix the size_bytes() result (before this PR it was using Parquet sampling, but we disasble it later)
Change one size_bytes() call to count() call as it was meant to use count() with followed wording That’s a lot of rows in doc.
Changed places are as followed in screenshots:
2022-08-14 12:34:33 -07:00