Commit graph

2397 commits

Author SHA1 Message Date
matthewdeng
88524d8b57
[air] add CustomStatefulPreprocessor (#25497) 2022-06-09 16:54:46 -07:00
Eric Liang
a058a98c5d
[docs] Try to clarify some advantages of bulk ingest in the AIR ingest docs (#25616) 2022-06-09 11:47:22 -07:00
Amog Kamsetty
1316a2d05e
[AIR/Train] Move ray.air.train to ray.train (#25570) 2022-06-08 21:34:18 -07:00
Dmitri Gekhtman
836b08597f
[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator (#25386)
This PR incorporates recent autoscaler config changes from KubeRay.
2022-06-08 20:09:43 -07:00
matthewdeng
ba0a2a022a
[datasets] add Dataset.randomize_block_order (#25568)
This exposes a low-cost way to perform a pseudo global shuffle.

For extremely large datasets that span multiple nodes, contiguous blocks will often be colocated on the same node. This leads to hot spots during iteration of the dataset in which single nodes (1) must send a lot of data over the network, and (2) perform lots of disk reads if the dataset is spilled to disk.

This allows the workload to be spread across the nodes on which the dataset blocks are on.
2022-06-08 18:39:15 -07:00
M Waleed Kadous
9e2e84bc1c
[docs] Add an example for simple highly parallelizable tasks. (#24885)
It's important to show how Ray can be used for easily parallelizable independent tasks. I put this together to demonstrate how to di this.
2022-06-08 18:10:37 -07:00
Antoni Baum
7616435ed0
[Docs] Capitalize Ray AIR (#25597) 2022-06-08 14:37:53 -07:00
Kai Fricke
aa142eb377
[RLlib; CI] Add team:rllib tag for Bazel. (#25589)
Currently, team:ml spans all ML (Tune, Train, AIR) tests and rllib tests. rllib tests are much more flaky and it would be good to split them up in the flaky test tracker. This PR changes Rllib-tests from team:ml to team:rllib to enable this separation.

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 22:25:59 +01:00
Jian Xiao
50c854b1ad
Fix hyperlink in rst doc (#25427)
Hyperlink not working

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
2022-06-08 13:46:23 -07:00
Clark Zinzow
9dc0bb3d5e
[Datasets] Unrevert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets. (#25031)" (#25531)
Unreverts #24812, skipping the memory releasing tests that are already flaky. We have a separate issue tracking the unskipping of these memory releasing tests, once we find a more reliable way to test them.

* Revert "Revert "Revert "Revert "[Datasets] [Tensor Story - 1/2] Automatically provide tensor views to UDFs and infer tensor blocks for pure-tensor datasets."" (#25031)" (#25057)"

This reverts commit fb2933a78f.

* Skip shuffle memory release test.
2022-06-08 10:33:25 -07:00
Amog Kamsetty
80ae651f25
[Train] Clean up ray.train package (#25566) 2022-06-08 10:22:36 -07:00
Archit Kulkarni
3296345557
Add warning about entrpoint command in quotes (#25519) 2022-06-08 09:38:55 -07:00
Kai Fricke
8affbc7be6
[tune/train] Consolidate checkpoint manager 3: Ray Tune (#24430)
**Update**: This PR is now part 3 of a three PR group to consolidate the checkpoints.

1. Part 1 adds the common checkpoint management class #24771 
2. Part 2 adds the integration for Ray Train #24772
3. This PR builds on #24772 and includes all changes. It moves the Ray Tune integration to use the new common checkpoint manager class.

Old PR description:

This PR consolidates the Ray Train and Tune checkpoint managers. These concepts previously did something very similar but in different modules. To simplify maintenance in the future, we've consolidated the common core.

- This PR keeps full compatibility with the previous interfaces and implementations. This means that for now, Train and Tune will have separate CheckpointManagers that both extend the common core
- This PR prepares Tune to move to a CheckpointStrategy object
- In follow-up PRs, we can further unify interfacing with the common core, possibly removing any train- or tune-specific adjustments (e.g. moving to setup on init rather on runtime for Ray Train)

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-08 12:05:34 +01:00
xwjiang2010
29a063afdf
[air] add feast example (#25417) 2022-06-07 14:55:42 -07:00
xwjiang2010
76b34d4a03
[air] add to_air_checkpoint method for inference only workload. (#25444)
Follow up on our last discussion for supporting piecemeal fashion air users.
Only did for tensorflow for now, want to collect some feedback on API naming, package structure etc and I will add others.
2022-06-07 14:50:39 -07:00
Simon Mo
7471b1fa41
[Serve] [AIR] ModelWrapper improvements and docs (#25003)
* batching collation code and tests

* wip notebook for np and dataframe

* finish content

* reset ray-more-libs changes

* add comments

* run through

* Apply suggestions from code review

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>

* rename package

* lint

* richard's comment

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
2022-06-07 08:53:10 -07:00
John B Nelson
e913352bdc
[Doc] Remove trailing period symbol install instruction (#25543) 2022-06-07 08:08:04 -07:00
Rohan Potdar
a9d8da0100
[RLlib]: Doubly Robust Off-Policy Evaluation. (#25056) 2022-06-07 12:52:19 +02:00
Zhe Zhang
6793426a9d
[Docs; RLlib] Remove $ from rllib pip install instructions (#25358) 2022-06-07 08:57:17 +02:00
Philipp Moritz
ec02e78b01
[docs] Use better method to mock ObjectRef (#25535)
Actually fix #25498
2022-06-06 23:50:52 -07:00
Eric Liang
c1afbcb6f4
[air] Enforce API stability annotations for AIR module (#25485) 2022-06-06 22:52:21 -07:00
Eric Liang
78688a0903
Enable streaming ingest in AIR (#25428)
This adds the following options to DatasetConfig, which can be used to enable streaming ingest.

```
    # Whether the dataset should be streamed into memory using pipelined reads.
    # When enabled, get_dataset_shard() returns DatasetPipeline instead of Dataset.
    # The amount of memory to use is controlled by `stream_window_size`.
    # False by default for all datasets.
    use_stream_api: Optional[bool] = None

    # Configure the streaming window size in bytes. A typical value is something like
    # 20% of object store memory. If set to -1, then an infinite window size will be
    # used (similar to bulk ingest). This only has an effect if use_stream_api is set.
    # Set to 1.0 GiB by default.
    stream_window_size: Optional[float] = None

    # Whether to enable global shuffle (per pipeline window in streaming mode). Note
    # that this is an expensive all-to-all operation, and most likely you want to use
    # local shuffle instead.
    # False by default for all datasets.
    global_shuffle: Optional[bool] = None
```
2022-06-06 17:42:15 -07:00
Richard Liaw
86837fa637
[docs/air] update order of documentation in toc (#25527)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2022-06-06 16:23:30 -07:00
Amog Kamsetty
365fc44754
[AIR] Update to new Predictor interface (#25425)
Updates the Predictor interface to have Pandas as a narrow waist.
2022-06-06 15:41:38 -07:00
G Goswami
7ddc23a8f5
Fixing example (#25524)
Remove quotes from K8s job submission example in docs.
2022-06-06 18:21:19 -04:00
Richard Liaw
36aee6a1c4
[air/docs] Update documentation structure (#25475)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-06-06 15:15:11 -07:00
Zhe Zhang
2d74ecc2ec
[Docs] [Clusters] Fix issues in the overview part of Cluster Deployment Guide, and fix a typo (#25473)
* Fix issues in the overview part, and fix a typo

* Addressing comment

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-06-06 14:11:41 -07:00
Philipp Moritz
8aff562c2f
[docs] Cleanup ray init docs (#25492) 2022-06-06 13:16:32 -07:00
Balaji Veeramani
5e06baa77e
[AIR] Remove /Users/balaji from Torch example (#25515) 2022-06-06 13:13:54 -07:00
kimikuri
60f59bd804
[Serve] Fix misspell in Serve Doc User Guides. (#25494) 2022-06-06 13:00:20 -07:00
Jiao
aa965ba0a9
[Deployment Graph] Add visualization cookbook (#25112) 2022-06-06 11:05:58 -07:00
Eric Liang
48acbf0d69
[hotfix] Revert "[runtime env] runtime env inheritance refactor (#24538)" (#25487)
This reverts commit eb2692c.

This is a temporary mitigation for #25484
2022-06-05 14:55:38 -07:00
Sven Mika
a559efb7e4
[CI; LinkCheck] 3 RLlib fixes. (#25476) 2022-06-04 11:54:56 +02:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
Zhe Zhang
4cc202585a
[Docs] Document Ray downscaling behavior (#25466) 2022-06-03 17:08:21 -07:00
Eric Liang
1f509ab331
[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337)
This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest).

This PR adds the minimal support for dataset configs. Future PRs will:
- Add support for streaming ingest
- Move this config from DataParallelTrainer to ml.Trainer
2022-06-03 16:32:53 -07:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
matthewdeng
2e05b62236
[AIR] Preprocessors feature guide (#25302) 2022-06-03 11:43:51 -07:00
Kai Fricke
313e8730a2
[tune/docs] Trial executor doc fix (#25440) 2022-06-03 16:25:38 +01:00
Kai Fricke
2e058380d7
[tune] Remove TrialExecutor base class (#25404)
The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.
2022-06-03 10:16:47 +01:00
Kai Fricke
f0fa8e54f8
[tune] Remove DurableTrainable class (#25405)
The DurableTrainable is deprecated (every trainable is a durable trainable). This PR removes it from the Tune library and a related example.
2022-06-03 10:16:02 +01:00
Yi Cheng
fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00
Jian Xiao
6589a4f8cb
[Datasets][UX Assessment] Add a section on how to write UDFs in Datasets (#25338)
The Datasets UX assessment showed that users had difficulties in writing UDFs: what's input/output types, how to write the function etc.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
2022-06-02 20:00:50 -07:00
Siyuan (Ryans) Zhuang
b5e71fde23
[workflow] Remove workflow virtual actor (#25394)
* remove workflow virtual actor
2022-06-02 18:17:25 -07:00
Stephanie Wang
473a962d89
[Datasets] [Docs] Add docs about fault tolerance in Datasets (#25371)
Adds description of fault tolerance guarantees for Datasets.

Related issue number

Closes #24856.
2022-06-02 15:53:50 -07:00
Stephanie Wang
ab8785ca5c
Revert "Revert "[core] Support generators for tasks with multiple return values (#25247)" (#25380)" (#25383)
Duplicate for #25247.

Adds a fix for Dask-on-Ray. Previously, for tasks with multiple return values, we implicitly allowed returning a dict with the return index as the key. This was used by Dask-on-Ray, but this is not documented behavior, and we now require task returns to be iterable instead.
2022-06-02 10:50:11 -07:00
Sihan Wang
3c9bd66485
[Serve][Doc] Add http endpoint for dag pattern doc (#25390) 2022-06-02 09:01:37 -07:00
Sven Mika
e4ceae19ef
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346) 2022-06-02 16:47:05 +02:00
Kai Fricke
6fe91885b0
[docs/lint] Fix reference to dataset_tune (#25402) 2022-06-02 11:40:26 +01:00
Eric Liang
51b295ad74
[docs] Improve Tune + Datasets documentation (#25389) 2022-06-01 21:52:32 -07:00