Commit graph

12859 commits

Author SHA1 Message Date
Richard Liaw
86837fa637
[docs/air] update order of documentation in toc (#25527)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2022-06-06 16:23:30 -07:00
Amog Kamsetty
365fc44754
[AIR] Update to new Predictor interface (#25425)
Updates the Predictor interface to have Pandas as a narrow waist.
2022-06-06 15:41:38 -07:00
G Goswami
7ddc23a8f5
Fixing example (#25524)
Remove quotes from K8s job submission example in docs.
2022-06-06 18:21:19 -04:00
Richard Liaw
36aee6a1c4
[air/docs] Update documentation structure (#25475)
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-06-06 15:15:11 -07:00
Philipp Moritz
406c2c5778
[docs] Fix mock objects in Ray Core docs (#25498)
Our API references are currently showing mock objects for some of our APIs -- this PR fixes them for the Ray Core API reference.
2022-06-06 15:09:01 -07:00
Kai Fricke
a0c8db1b5e
[release] Update download_wheels.sh to include Python 3.10 (#25508)
Currently the download script does not contain python 3.10
2022-06-06 22:42:50 +01:00
simonsays1980
2a5d322e70
[tune] Relative logdir paths in trials for ExperimentAnalysis in remote buckets (#25063)
When running an experiment for example in the cloud and syncing to a bucket the logdir path in the trials will be changed when working with the checkpoints in the bucket. There are some workarounds, but the easier solution is to also add a rel_logdir containing the relative path to the trials/checkpoints that can handle any changes in the location of experiment results.

As discussed with @Yard1 and @krfricke

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-06-06 22:41:41 +01:00
Vince Jankovics
68444cd390
[tune] Custom resources per worker added to default_resource_request (#24463)
This resolves the `TODO(ekl): add custom resources here once tune supports them` item. 
Also, related to the discussion [here](https://discuss.ray.io/t/reserve-workers-on-gpu-node-for-trainer-workers-only/5972/5).

Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-06-06 22:41:02 +01:00
Florian Boucault
9a510f92cf
[Serve] Depend on uvicorn[standard] instead of uvicorn so that it pulls in uvloop (#25027) 2022-06-06 14:23:00 -07:00
Zhe Zhang
2d74ecc2ec
[Docs] [Clusters] Fix issues in the overview part of Cluster Deployment Guide, and fix a typo (#25473)
* Fix issues in the overview part, and fix a typo

* Addressing comment

Co-authored-by: Alex Wu <alex@anyscale.com>
2022-06-06 14:11:41 -07:00
Philipp Moritz
8aff562c2f
[docs] Cleanup ray init docs (#25492) 2022-06-06 13:16:32 -07:00
Balaji Veeramani
5e06baa77e
[AIR] Remove /Users/balaji from Torch example (#25515) 2022-06-06 13:13:54 -07:00
Sihan Wang
0441834021
[Serve] Fix test_standalone flacky (#25513) 2022-06-06 13:13:32 -07:00
kimikuri
60f59bd804
[Serve] Fix misspell in Serve Doc User Guides. (#25494) 2022-06-06 13:00:20 -07:00
shrekris-anyscale
e433424796
[Serve] Checkpoint the DeploymentState's _deleting attribute (#25478) 2022-06-06 12:06:51 -07:00
Eric Liang
94dec83a60
[data] Rename data.impl to data._internal (#25486) 2022-06-06 11:39:53 -07:00
shrekris-anyscale
ce3faed897
[Serve] Avoid deserializing ReplicaConfig properties in the Serve controller (#25213) 2022-06-06 11:08:06 -07:00
Jiao
aa965ba0a9
[Deployment Graph] Add visualization cookbook (#25112) 2022-06-06 11:05:58 -07:00
mwtian
1ce0ab7b7c
[Core] Export additional metrics for workers and Raylet memory (#25418)
Add visibility into the following to help Ray users and developers debug performance and OOM issues:

    Raylet memory usage broken down by USS vs remaining RSS.
    Total workers' count, CPU percentage usage, and memory usage.
2022-06-06 10:58:14 -07:00
Andrew Li
3853186472
Exposed upscaling_speed and idle_timeout_minutes to values.yaml, #25312 (#25495)
Exposed upscaling_speed and idle_timeout_minutes to values.yaml.
2022-06-06 13:26:06 -04:00
Alex Wu
a9bf8d455f
[github] Update code owners for cluster docs (#25507)
In the same spirit of #25479 adding myself and @DmitriGekhtman as code owners of the autoscaler/cluster launcher docs since we are also the code owners for the code.
2022-06-06 09:36:39 -07:00
Balaji Veeramani
c4898ed7df
[AIR] [Datasets] Add convert_pandas_to_tf_tensor (#25133)
Dataset.to_tf and TensorflowPredictor attempt to convert Pandas dataframes to NumPy arrays by calling DataFrame.values. However, DataFrame.values fails if the dataframe contains multidimensional arrays.

This PR solves this problem by introducing a function convert_pandas_to_tf_tensor. The implementation of the function is based on the implementation of convert_pandas_to_torch_tensor.
2022-06-06 08:29:51 -07:00
Artur Niederfahrenhorst
5133978adc
[RLlib] PG policy subclassing conversion. (#25288) 2022-06-06 13:07:47 +02:00
Artur Niederfahrenhorst
243038d00a
[RLlib] Issue 25401: Faulty usage of get_filter_config in ComplexInputNetworks (#25493) 2022-06-06 13:04:17 +02:00
kourosh hakhamaneshi
d49d0efbaf
[RLlib] Bug fix: when on GPU, sample_batch.to_device() only converts the device and does not convert float64 to float32. (#25460) 2022-06-06 12:43:11 +02:00
Artur Niederfahrenhorst
c4a0e9d0f2
[RLlib] Disambiguate timestep fragment storage unit in replay buffers. (#25242) 2022-06-06 11:35:49 +02:00
Sebastián Ramírez
298742d724
♻️ Refactor type annotations for .remote() to avoid incorrect autocompletion and checks (#25480)
With the current type annotations for the `.remote()` method generated in decorated functions, editors understand that there are some keyword arguments `arg0`, `arg1`, etc. Which are incorrect as the actual function will probably have different names for its arguments.

For example, this shouldn't autocomplete `arg0`, `arg1`, etc:

<img width="407" alt="Screenshot 2022-06-04 at 06 13 46" src="https://user-images.githubusercontent.com/1326112/171996654-12248369-cf10-4fce-9ea2-5deb4ca8e2bd.png">

If anything, it should autocomplete `x` and `y` (although that's currently [not perfectly doable](https://github.com/python/typing/discussions/1163)).

By updating the type annotations to use [arguments prefixed with double underscores](https://mypy.readthedocs.io/en/stable/protocols.html?highlight=double%20underscore#callback-protocols) at least it tells tooling to not provide autocompletion for those args (which would be incorrect). While still providing inline errors for invalid types.

<img width="880" alt="Screenshot 2022-06-04 at 06 20 26" src="https://user-images.githubusercontent.com/1326112/171996806-560c0fa8-0ee3-477c-9906-71e880c84e56.png">
2022-06-05 16:21:53 -07:00
Eric Liang
48acbf0d69
[hotfix] Revert "[runtime env] runtime env inheritance refactor (#24538)" (#25487)
This reverts commit eb2692c.

This is a temporary mitigation for #25484
2022-06-05 14:55:38 -07:00
Sebastián Ramírez
6e1248fb37
🚚 Move worker types to the module to improve static analysis (#25439)
Currently, there are separated type annotations in `worker.pyi` that include the types for `func.remote()`, but they don't include types for the other things declared in `worker.py`. Because of that, editors can end up showing support only for the things in the `worker.pyi` file.

For example:

<img width="349" alt="Screenshot 2022-06-03 at 06 01 36" src="https://user-images.githubusercontent.com/1326112/171841977-ec7a0b9a-b4a5-4422-acd9-b73c1e263261.png">

After this change, the editor and other tools will be able to provide support for other things defined in the same file:

<img width="760" alt="Screenshot 2022-06-03 at 06 04 24" src="https://user-images.githubusercontent.com/1326112/171842204-1915dd2a-6cc6-41b7-8785-5124beec37e8.png">

And the typing support for `func.remote()` keeps working as before:

<img width="760" alt="Screenshot 2022-06-03 at 06 07 15" src="https://user-images.githubusercontent.com/1326112/171842528-f318753e-9f47-4236-b0a4-d86d00c0bb11.png">

This is the recommended approach by PyRight/Pylance/VS Code. I also recommend it as it's a lot easier to maintain types in the same file while editing than remembering to go to an external independent file to add those types. Also, to have proper support when using an external `.pyi` file *all* the things declared in `worker.py` would have to be declared in the `worker.pyi` file.

Ref: https://github.com/microsoft/pyright/blob/main/docs/typed-libraries.md#inlined-type-annotations-and-type-stubs
2022-06-05 14:01:24 -07:00
Yi Cheng
acf210fcac
[flakey] Skip ray_syncer_test for ubsan. (#25477)
From the message:
```
[       OK ] SyncerTest.TestMToN (13132 ms)
[----------] 5 tests from SyncerTest (43175 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 2 test suites ran. (43176 ms total)
[  PASSED  ] 8 tests.
external/com_github_grpc_grpc/src/core/lib/iomgr/ev_posix.cc:314:19: runtime error: member access within null pointer of type 'const struct grpc_event_engine_vtable'
```

This can only be reproduced by running with Bazel test so far. With gdb, it won't be reproduced. It seems like some issue with the grpc maybe the reactor API. 
Given that the ASAN test, which is supposed to catch the issue, runs well, and a considerable time has been spent investigating this one but no progress, skip this test for now.
2022-06-04 23:06:57 -07:00
matthewdeng
7dafb2e278
[air] remove invalid wandb symlink (#25488) 2022-06-04 22:17:08 -07:00
Kai Fricke
f4d3daa3cc
[github] Codeowners for docs (#25479) 2022-06-04 22:09:00 -07:00
Jun Gong
644b80c0ef
[RLlib] mark learning and examples tests exclusive. (#25445) 2022-06-04 09:35:24 -07:00
SangBin Cho
00e3fd75f3
[State Observability] Ray log alpha API (#24964)
This is the PR to implement ray log to the server side. The PR is continued from #24068.

The PR supports two endpoints;

/api/v0/logs # list logs of the node id filtered by the given glob. 
/api/v0/logs/{[file | stream]}?filename&pid&actor_id&task_id&interval&lines # Stream the requested file log. The filename can be inferred by pid/actor_id/task_id
Some tests need to be re-written, I will do it soon.

As a follow-up after this PR, there will be 2 PRs.

PR to add actual CLI
PR to remove in-memory cached logs and do on-demand query for actor/worker logs
2022-06-04 05:10:23 -07:00
Sven Mika
a559efb7e4
[CI; LinkCheck] 3 RLlib fixes. (#25476) 2022-06-04 11:54:56 +02:00
Yi Cheng
47c4f6f094
[flakey] Fix test_modin.py (#25469)
test_modin.py is flakey right now. It complains about some modules can't be imported. This seems like a init issue where client mode and non-client mode are mixed. This test closes the cluster for each run. It slows the test a little bit, but it's more stable.
2022-06-04 08:34:37 +00:00
Max Pumperla
c5f4a82e3c
Add doc code owners (#24910) 2022-06-03 23:59:32 -07:00
Sven Mika
b5bc2b93c3
[RLlib] Move all remaining algos into algorithms directory. (#25366) 2022-06-04 07:35:24 +02:00
SangBin Cho
54496d7705
[State Observability API] Support Filtering (#25281)
This PR adds a filtering support. The filtering is done from the API server side (not from the source side). Source side filtering is a bit complicated to write an elegant solution, and we will handle it in the future (no optimization for alpha APIs).

We will also support limited types of columns for each API.

The API is as follows

ray list [resources] -- filter [key] [value] => filter data that's key==value. 
In the future, we can also support more complicated filtering like !=, And, Or , or etc.
2022-06-03 17:17:30 -07:00
Zhe Zhang
4cc202585a
[Docs] Document Ray downscaling behavior (#25466) 2022-06-03 17:08:21 -07:00
Eric Liang
1f509ab331
[air] Add DatasetParallelTrainer.dataset_config for configuring dataset ingest (#25337)
This adds a per-dataset config object to DataParallelTrainer. These configs define how the Dataset should be read into the DataParallelTrainer. It configures the preprocessing, splitting, and ingest strategy per-dataset. DataParallelTrainers declare default DatasetConfigs for each dataset passed in the ``datasets`` argument. Users have the opportunity to selectively override these configs by passing the ``dataset_config`` argument. Trainers can also define user customizable values (e.g., XGBoostTrainer doesn't support streaming ingest).

This PR adds the minimal support for dataset configs. Future PRs will:
- Add support for streaming ingest
- Move this config from DataParallelTrainer to ml.Trainer
2022-06-03 16:32:53 -07:00
Eric Liang
22aaf47fda
[tune] Better error message for Tune nested tasks / actors (#25241)
This PR uses a task/actor launch hook to generate better error messages for nested Tune tasks/actors in the case there are no extra resources reserved for them. The idea is that the Tune trial runner actor can set a hook prior to executing the user code. If the user code launches a task, and the placement group for the trial cannot possibly fit the task, then we raise TuneError right off to warn the user.
2022-06-03 14:53:40 -07:00
Sihan Wang
03ed27b9c1
[Serve] Fix the test_serve_start_different_http_checkpoint_options_warning flaky (#25452) 2022-06-03 14:45:00 -07:00
Kai Fricke
4b9a89ad90
[air] Move python/ray/ml to python/ray/air (#25449)
The package "ml" should be renamed to "air".

Main question: Keep a `ml.py` with `from ray.air import *` for some level of backwards compatibility?
I'd go for no to force people to use the new structure.
2022-06-03 21:53:44 +01:00
Yi Cheng
6b38b071e9
Revert "Revert "[core] Remove gcs addr updater in core worker. (#24747)" (#25375)" (#25391)
This reverts commit 49efcab4fe.
2022-06-03 12:26:27 -07:00
matthewdeng
2e05b62236
[AIR] Preprocessors feature guide (#25302) 2022-06-03 11:43:51 -07:00
Kai Fricke
313e8730a2
[tune/docs] Trial executor doc fix (#25440) 2022-06-03 16:25:38 +01:00
Kai Fricke
7186cd8b79
[tune] Remove various deprecated code paths (deprecation cycle) (#25407)
This PR removes various deprecated code paths in Ray Tune that raised errors on usage before.
2022-06-03 15:01:40 +01:00
Sven Mika
6c7f781d8e
[RLlib] Unflake some CI-tests. (#25313) 2022-06-03 14:51:50 +02:00
Kai Fricke
2e058380d7
[tune] Remove TrialExecutor base class (#25404)
The TrialExecutor base class was a stub and has been deprecated long ago; direct inheritance was disabled. This PR removes the base class and moves the remaining functionality into the RayTrialExecutor.
2022-06-03 10:16:47 +01:00