Commit graph

2576 commits

Author SHA1 Message Date
xwjiang2010
ff2b728e9a
[air] add tuner user guide (#26837)
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-03 09:43:42 -07:00
Archit Kulkarni
a12c04a2fe
[Serve] [Doc] Update key concepts for 2.0, remove deprecated APIs (#26965)
Removes deprecated APIs:
- serve.start()
- get_handle()

Rewrites the ServeHandle doc snippet to use the recommended workflow for ServeHandles (only access them from other deployments, pass Deployments in as input args to `.bind()`, which get resolved to ServeHandles at runtime)

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
2022-08-03 11:27:23 -05:00
Jiajun Yao
8b7e4ac701
[Doc] Test ray core doc code (#27334)
- Currently not all code under ray-core/doc_code is covered by CI.
- tf_example.py and torch_example.py are not used anywhere.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-02 20:51:47 -07:00
Dmitri Gekhtman
4d87e8112a
[docs][kubernetes] GPU user guide (#27360)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

This PR

adds a page of guidance on GPU deployment with Ray/K8s. This page is a modified and slightly expanded version of the existing page https://docs.ray.io/en/latest/cluster/kubernetes-gpu.html
moves managed K8s service intro links to their own page
2022-08-02 15:58:23 -07:00
Avnish Narayan
00f9438101
[RLlib] Training step docs. (#27344) 2022-08-02 23:41:45 +02:00
Archit Kulkarni
e02b072939
[Doc] [Serve] Edit grammar/usage/organization for HTTP adapters page (#26969)
Moves FastAPI into its own section instead of appearing in a duplicated note.

Co-authored-by: simon-mo <simon.mo@hey.com>
2022-08-02 15:08:05 -05:00
Richard Liaw
c8561071f3
[air/train/docs] gbdt trainer user guide (#27362)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-08-02 13:02:42 -07:00
clarng
84674fa868
[docs] ray core namespace docs: edit pass & move python code into doc_code dir (#27341) 2022-08-02 12:52:30 -07:00
clarng
34385b8136
[docs] ray core cross-lang docs: edit pass & move python code into doc_code dir (#27350)
Edit pass. Move code into doc_code dir. Code in doc_code is verified by CI
2022-08-02 12:50:05 -07:00
Jiajun Yao
cd2e590567
Support placement_group=None in PlacementGroupSchedulingStrategy (#27370)
We decided to allow escaping the parent pg via `PlacementGroupSchedulingStrategy(placement_group=None)` instead of using "DEFAULT". Our doc is updated with that but in the code it's still not allowed.
2022-08-02 12:49:41 -07:00
Ricky Xu
82a24f9319
[Doc][Core][State Observability] Adding Python SDK doc and docstring (#26997)
1. Add doc for python SDK and docstrings on public SDK
2. Rename list -> ray_list and get -> ray_get for better naming 
3. Fix some typos 
4. Auto translate address to api server url.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2022-08-02 11:24:59 -05:00
xwjiang2010
36cf1baa82
[air doc] checkpoint_freq --> checkpoint_frequency (#27325)
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
2022-08-02 11:34:10 +01:00
Jules S. Damji
4045ba4841
[DOC Ray AIR] minor editorial tweaks for clarity and usage (#27128)
Co-authored-by: Jules Damji <jules@anyscale.com>
2022-08-01 21:09:04 -07:00
Dmitri Gekhtman
6efca71c35
[docs][kubernetes] XGBoost ML example (#27313)
Adds a guide on running an XGBoost-Ray workload using KubeRay.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-01 19:30:41 -07:00
shrekris-anyscale
324d8e4bca
[Serve] Serialize user_config with JSON instead of Pickle (#26235) 2022-08-01 17:53:43 -07:00
Eric Liang
f7ae8923f6
[docs] Reorganize the tensor data support docs; general editing (#26952)
Why are these changes needed?
Editing pass over the tensor support docs for clarity:

Make heavy use of tabbed guides to condense the content
Rewrite examples to be more organized around creating vs reading tensors
Use doc_code for testing
2022-08-01 17:31:41 -07:00
clarng
fffcae1cb4
[docs] ray core dag docs: edit pass & move code into separate dir (#27318) 2022-08-01 17:05:36 -07:00
shrekris-anyscale
cc84953da3
[Serve] [Docs] Update "Getting Started" documentation (#26745) 2022-08-01 16:31:48 -07:00
matthewdeng
fedfaddb3f
[docs] add k8s docs to toc (#27310) 2022-07-30 15:26:30 -07:00
clarng
a61478fb73
import style (#25755) 2022-07-30 09:43:09 -07:00
Dmitri Gekhtman
059895ab5b
[docs][kubernetes] Shift docs into new structure (#27239)
This PR shifts KubeRay docs into the structure introduced in #27036.
There are no content changes.
2022-07-29 14:19:51 -07:00
Siyuan (Ryans) Zhuang
1bcd3e41d1
[Workflow] Cleanup workflow docs (#27197)
* cleanup workflow docs

Signed-off-by: Siyuan Zhuang <suquark@gmail.com>
2022-07-29 13:03:50 -07:00
Kai Fricke
1f097e9d12
[tune/docs] Update custom syncer example (#27252)
There is a small bug in the docs example for custom command based syncers. This PR fixes them and adds a test to test these changes.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-29 16:09:19 +01:00
xwjiang2010
d331489a9d
[ air ] clean up some more tune.run (#27117)
More replacements of tune.run() in examples/docstrings for Tuner.fit()

Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>

Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-29 10:43:45 +01:00
Jimmy Yao
749d313dcd
hot fix ray lightning (#27235)
hot fix ray lightning #27235
2022-07-28 22:41:28 -07:00
Cade Daniel
0374637e53
Adding --keep-going flag to sphinx-build so all lint failures are listed in CI (#27068)
This PR adds --keep-going flag to the make html target for building the Ray docs. This means that when there is a lint failure in CI, the BuildKite log will show all lint failures instead of just the first one. Despite continuing past the first lint error, it will still fail the build.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-07-28 16:24:27 -07:00
Jimmy Yao
73e1632599
Hot fix again ray lightning docs (#27229) 2022-07-28 16:19:30 -07:00
Clark Zinzow
df124d0ad5
[AIR - Datasets] Hide tensor extension from UDFs. (#27019)
We previously added automatic tensor extension casting on Datasets transformation outputs to allow the user to not have to worry about tensor column casting; however, this current state creates several issues:

1. Not all tensors are supported, which means that we’ll need to have an opaque object dtype (i.e. ndarray of ndarray pointers) fallback for the Pandas-only case. Known unsupported tensor use cases:
a. Heterogeneous-shaped (i.e. ragged) tensors
b. Struct arrays
2. UDFs will expect a NumPy column and won’t know what to do with our TensorArray type. E.g., torchvision transforms don’t respect the array protocol (which they should), and instead only support Torch tensors and NumPy ndarrays; passing a TensorArray column or a TensorArrayElement (a single item in the TensorArray column) fails.
Implicit casting with object dtype fallback on UDF outputs can make the input type to downstream UDFs nondeterministic, where the user won’t know if they’ll get a TensorArray column or an object dtype column.
3. The tensor extension cast fallback warning spams the logs.

This PR:

1. Adds automatic casting of tensor extension columns to NumPy ndarray columns for Datasets UDF inputs, meaning the UDFs will never have to see tensor extensions and that the UDF input column types will be consistent and deterministic; this fixes both (2) and (3).
2. No longer implicitly falls back to an opaque object dtype when TensorArray casting fails (e.g. for ragged tensors), and instead raises an error; this fixes (4) but removes our support for (1).
3. Adds a global enable_tensor_extension_casting config flag, which is True by default, that controls whether we perform this automatic casting. Turning off the implicit casting provides a path for (1), where the tensor extension can be avoided if working with ragged tensors in Pandas land. Turning off this flag also allows the user to explicitly control their tensor extension casting, if they want to work with it in their UDFs in order to reap the benefits of less data copies, more efficient slicing, stronger column typing, etc.
2022-07-28 10:37:45 -07:00
shrekris-anyscale
510a0e038c
[Serve] Add host and port options to the Serve config file (#27026)
The Serve CLI and REST API always sets the host to `0.0.0.0` and the port to Serve's default. This change adds `host` and `port` as top level options in the Serve config file, so users can manually set the host and port of their Serve application to different values.

This change introduces a new Serve config file format:

```yaml
import_path: ...

runtime_env: ...

host: ...

port: ...

deployments: ...
    ...
```

`host` and `port` are optional and can be omitted. A running Serve application's `host` and `port` cannot be changed. If a user tries to `serve deploy` a config file with different `host` and `port` options than an already-running Serve application, `serve deploy` will fail without making any changes to the application. The user must `serve shutdown` their application and restart it with `serve deploy` to change their `host` and `port`.

**Follow-Up Items**
* The following CLI commands should **not** start Serve automatically. They should check whether Serve is running and perform some sort of no-op if it's not. That would alleviate the concern that the user starts Serve by accident through a `GET` request and needs to deal with default `host`/`port` options. Corresponding docs should also be updated.
    * `serve status`
    * `serve config`
    * `serve shutdown`
2022-07-28 11:26:46 -05:00
Jiao
0dbb18a87d
[AIR][Data] Fix nyc_taxi_basic_processing notebook (#26983) 2022-07-27 21:37:04 -07:00
Cade Daniel
db26c779a0
[Ray clusters] [docs] Copying all Ray Clusters doc content to new structure (#27062) 2022-07-27 14:22:44 -07:00
xwjiang2010
eb69c1ca28
[air] Add annotation for Tune module. (#27060)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 13:53:46 -07:00
Kai Fricke
3924a4b7cc
[air/train] Rename BaseWorkerMixin, only log info torch loop for rank 0 (#27098)
This PR

- only prints train_loop info strings (e.g. `train_loop_utils.py:298 -- Moving model to device: cpu`) for rank 0 workers for torch
- renames `BaseWorkerMixin` to `RayTrainWorker` as the name comes up often in output and is more meaningful

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-27 20:11:59 +01:00
matthewdeng
113c4d7fab
[air][data] move train_test_split to ray.data.Dataset (#27065) 2022-07-27 09:53:37 -07:00
Simon Mo
e5a8b1dd55
[Serve] Add API Annotations And Move to _private (#27058) 2022-07-27 09:08:26 -07:00
Amog Kamsetty
862d10c162
[AIR] Remove ML code from ray.util (#27005)
Removes all ML related code from `ray.util`

Removes:
- `ray.util.xgboost`
- `ray.util.lightgbm`
- `ray.util.horovod`
- `ray.util.ray_lightning`

Moves `ray.util.ml_utils` to other locations

Closes #23900

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-07-27 14:24:19 +01:00
Cade Daniel
7a817ad364
Moving Ray Clusters restructuring section to be subpage under existing Ray Clusters. (#27036)
This PR puts the Ray Clusters (under construction) docs section (see #26754) under Ray Clusters as a subpage.

This makes the master branch docs clean and presentable for users
Ray Clusters doc writers can use existing CI to iterate on the docs, without having a massive PR once we're done.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-07-26 15:52:06 -07:00
Balaji Veeramani
89f7f2a567
[Datasets] Add size parameter to ImageFolderDatasource (#26975)
If you read a folder with differently-sized images, `ImageFolderDatasource` errors. This PR fixes the issue by resizing images to a user-specified size.
2022-07-26 14:57:38 -07:00
Rohan Potdar
deccf33912
[RLlib]: Add Off-Policy Estimation docs (#26809)
Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2022-07-26 13:57:56 -07:00
Cade Daniel
0427add12b
Adding Ray Clusters (Under Construction) doc section with new structure (#26754)
This PR:

Creates a new chapter in the docs titled "Ray Clusters (Under Construction)".
The new chapter makes the Ray Clusters docs follow the same structure as the other docs (https://diataxis.fr/)
The new chapter will eventually replace the old chapter.
I want to merge this now so that @DmitriGekhtman can put his Kuberay docs into the new structure.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-07-26 12:00:20 -07:00
Siyuan (Ryans) Zhuang
e1db8fb382
[Workflow] Workflow client integration (#26702)
## Why are these changes needed?

This PR ensures that workflow can work properly with Ray client.

Regular workflow tests will (also) be running under client mode (as a pytest parameter). Some tests are moved and reorganized, because the Ray client tests requires starting the cluster, so some tests requires isolation or related changes.

Tests that literally take down the cluster are not tested with Ray client, since Ray client would fail in this scenario.

Limitations of Ray Workflow under Ray client are noted in the doc.

## Related issue number

Closes #21595
2022-07-26 11:15:47 -07:00
Balaji Veeramani
8bc836d9fb
[AIR] Remove CustomStatefulPreprocessor (#26981) 2022-07-26 10:10:57 -07:00
Balaji Veeramani
55988992b9
[AIR] Rename limit parameter as max_categories (#26977) 2022-07-26 10:10:40 -07:00
SangBin Cho
39b9c44c8d
[State Observability] pre-alpha documentation (#26560)
Adds

Documentation for state APIs
API reference
2022-07-26 05:49:28 -07:00
Dmitri Gekhtman
a70ada7341
[kubernetes][docs] Implement landing page and getting started guide (#26912)
Implements a landing page for the new KubeRay-based deployment guide.
Implements a "Getting started" Jupyter notebook
2022-07-26 00:41:56 -07:00
Archit Kulkarni
084f06f49a
[Doc] [Job submission] [Dashboard] Add tip for long runtime_env installation and improve error (#26911)
# Why are these changes needed?
The dashboard can display the message <actor> cannot be created because the Ray cluster cannot satisfy its resource requirements in the case where the runtime env setup is stalled. This PR updates this message to include the possibility of the runtime env setup failing.
This PR adds a tip to the Job Submission doc saying that if a job is stalled in PENDING, the runtime env setup may have stalled. It adds a pointer to the log files which should have more information.
The runtime env cannot stall forever, it fails after 10 minutes. This is a new feature added after the Ray 1.13 branch cut. In Ray <= 1.13, the runtime env can still stall forever.

# Related issue number
Closes #26332
2022-07-25 23:32:27 -07:00
Sihan Wang
8ecd928c34
[Serve] Make the checkpoint and recover only from GCS (#26753) 2022-07-25 14:24:53 -07:00
Jules S. Damji
193e824bc1
[AIR DOC] minor tweaks to checkpoint user guide for clarity and consistency subheadings (#26937)
Co-authored-by: Jules Damji <jules@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-25 14:21:29 -07:00
Jiao
5315f1e643
[AIR] Enable other notebooks previously marked with # REGRESSION (#26896)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-07-25 13:40:21 -07:00
matthewdeng
df638b3f0f
[Datasets] Automatically cast tensor columns when building Pandas blocks. (#26924)
This PR just applies the changes from the following PRs:

[Datasets] Automatically cast tensor columns when building Pandas blocks. #26684
reverted by Revert "[Datasets] Automatically cast tensor columns when building Pandas blocks." #26921
[AIR - Datasets] Fix TensorDtype construction from string and fix example. #26904
This fixes the test failures introduced in the originally reverted PRs.
2022-07-25 12:12:10 -07:00