Commit graph

13956 commits

Author SHA1 Message Date
Avnish Narayan
55209692ee
[RLlib] Deflake MARWIL and BC and remove memory leak from torch MARWIL policy (#27406) 2022-08-03 16:53:12 -07:00
Eric Liang
67a306f92f
[docs] Update colors and styling of ray diagrams (#27474) 2022-08-03 16:49:25 -07:00
Eric Liang
340f0960d6
[docs] Improve the AIR introductory page (#27347) 2022-08-03 16:04:04 -07:00
Ricky Xu
8498a56fe2
[Core][fix] Increasing timeout on non-windows for test_metrics (#27379)
The test was timing out.

A normal pass was ~17secs.
2022-08-03 15:22:00 -07:00
Alan Guo
2cf9ecf48e
Make it so pydantic is required before we launch dashboard api server (#27345)
* Make it so pydantic is required before we launch dashboard api server

Signed-off-by: Alan Guo <aguo@anyscale.com>
2022-08-03 14:24:51 -07:00
Balaji Veeramani
fd381927c1
[AIR] Add optional mode parameter and make size parameter optional (#27295)
1. If a user reads a folder with grayscale and color images, ImageFolderDatasource errors.
2. There's no way to retain image shapes.

Co-authored-by: Clark Zinzow <clarkzinzow@gmail.com>
2022-08-03 13:20:46 -07:00
Eric Liang
c7056309c4
[docs] Revamp README and Ray intro doc page (#27405)
This PR revamps and aligns the README and Ray intro doc page:

New "What is Ray" diagram that introduces AIR vs Ray core (diagram TBD finalized, this is the working placeholder)
Update the description of Ray
Link out to the user guides for key libraries and key concepts
Remove old / broken links, as well as the inline library descriptions from the README
2022-08-03 13:19:00 -07:00
Archit Kulkarni
9f0d8e364d
[Doc] Update Serve architecture doc for 2.0 (#26861)
- Move autoscaling architecture from autoscaling page to architecture page
- Update architecture page
    - Remove "Router" actor
    - Update description of ServeHandle
    - Update defaults about HTTPproxy (default one on each node -> default just one per cluster, on the head node)
- Add note about fault tolerance in different failure scenarios
- Assorted typos/usage nits

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2022-08-03 14:30:33 -05:00
clarng
a2eaa7a5a4
[docs][Core] rename 'more topics' to 'advanced topics' (#27385)
Ray 2.0 doc update : Rename 'more topics' to 'advanced topics'. Also cleaned up misc topics to have a consistent name
2022-08-03 12:14:43 -07:00
Cade Daniel
99ad0667a5
[docs][Ray Clusters] Migrate Community Supported Cluster Launcher to new structure. (#27376)
This PR migrates the old Community Supported Cluster Launcher docs to the new Ray Clusters doc structure.

Signed-off-by: Cade Daniel <cade@anyscale.com>
2022-08-03 11:07:10 -07:00
zcin
286343601a
[Serve] Enable lightweight config update (#27000) 2022-08-03 11:49:41 -05:00
xwjiang2010
ff2b728e9a
[air] add tuner user guide (#26837)
Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2022-08-03 09:43:42 -07:00
Archit Kulkarni
a12c04a2fe
[Serve] [Doc] Update key concepts for 2.0, remove deprecated APIs (#26965)
Removes deprecated APIs:
- serve.start()
- get_handle()

Rewrites the ServeHandle doc snippet to use the recommended workflow for ServeHandles (only access them from other deployments, pass Deployments in as input args to `.bind()`, which get resolved to ServeHandles at runtime)

Co-authored-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
2022-08-03 11:27:23 -05:00
Jimmy Yao
1c1cca2736
[release/ray-lightning] adjust the release test of ray lightning master
First of all, sorry i messed up with the previous pr when sync with the master (#27374). This PR is the duplicate of previous pr until we update the changes (change: adding the version check for the ray_lightning for the compatibility). Also, apology for the massive review requests on the previous PR.
2022-08-03 16:01:32 +01:00
Kai Fricke
20119c7022
[tune] Fix test_actor_reuse.py::ActorReuseMultiTest test (#27427)
Increase time to allow for scheduling latency

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-03 13:54:11 +01:00
Kai Fricke
46ed3557ba
[tune] Fix test_resource_exhausted_info test (#27426)
#27213 broke this test

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-03 13:53:46 +01:00
Simon Mo
4e07019b88
[Serve] Fix Graph Repeated Invocation (#27417) 2022-08-03 01:40:19 -07:00
shrekris-anyscale
adc7c4dc87
[Serve] Make serve.run() and deployment.bind() beta APIs (#27401) 2022-08-02 23:11:23 -07:00
Simon Mo
8ac6d02502
[Serve][Nightly] Environment for Nightly K8s Tests (#27126) 2022-08-02 23:05:47 -07:00
Jiajun Yao
8b7e4ac701
[Doc] Test ray core doc code (#27334)
- Currently not all code under ray-core/doc_code is covered by CI.
- tf_example.py and torch_example.py are not used anywhere.

Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com>
2022-08-02 20:51:47 -07:00
Simon Mo
6084eb6a9f
Revert "Revert "[Serve] ServeHandle detects ActorError and drop replicas from target group (#26685)" (#27283)" (#27348) 2022-08-02 20:04:03 -07:00
Rohan Potdar
5b6a58ed28
[RLlib] Add OPE Learning Tests (#27154) 2022-08-02 17:51:38 -07:00
Richard Liaw
6dc3dbdd37
[air] Update to beta (#27393)
Update API references to beta. Needed as we are going to beta in 2.0.

I left out RL/Scikit-Learn/HuggingFace.
2022-08-02 17:10:41 -07:00
Dmitri Gekhtman
4d87e8112a
[docs][kubernetes] GPU user guide (#27360)
Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

This PR

adds a page of guidance on GPU deployment with Ray/K8s. This page is a modified and slightly expanded version of the existing page https://docs.ray.io/en/latest/cluster/kubernetes-gpu.html
moves managed K8s service intro links to their own page
2022-08-02 15:58:23 -07:00
Eric Liang
91a03026ef
[air] Fix BatchPredictor.predict_pipelined not working with GPU stage (#27232) 2022-08-02 15:36:40 -07:00
Alan Guo
c083ca5871
Add GPU info to new dashboard (#27074)
Support a GPU column for the new dashboard

Have first node be default expanded

Signed-off-by: Alan Guo aguo@anyscale.com

fixes #13889

Addresses comment from #26996
2022-08-02 15:32:55 -07:00
Clark Zinzow
291a294208
[AIR - Serve] [Hotfix] Check for tensor extension via dtype rather than a NumPy conversion (#26891)
Converting a Pandas DataFrame column to an ndarray (e.g. via df[col].values) can often result in a full copy of the column in order to construct the ndarray due to Pandas' 2D block management. This PR ports tensor extension type checking to checking the dtype, which is always an O(1) check.

Signed-off-by: Clark Zinzow <clarkzinzow@gmail.com>
2022-08-02 14:52:46 -07:00
Avnish Narayan
00f9438101
[RLlib] Training step docs. (#27344) 2022-08-02 23:41:45 +02:00
Ricky Xu
122eda2757
[Core] Move test_state_api test back to large test groups (#27377)
Why are these changes needed?
python/tests/test_state_api.py runs for 5min in normal run
2022-08-02 14:21:34 -07:00
Eric Liang
6384734071
[docs] Adjust the set of global doc owners to those responsible for copy-editing
Signed-off-by: Eric Liang <ekhliang@gmail.com>
2022-08-02 14:09:21 -07:00
Archit Kulkarni
e02b072939
[Doc] [Serve] Edit grammar/usage/organization for HTTP adapters page (#26969)
Moves FastAPI into its own section instead of appearing in a duplicated note.

Co-authored-by: simon-mo <simon.mo@hey.com>
2022-08-02 15:08:05 -05:00
Simon Mo
a9d94f740c
[Serve] Remove the warning for async handles in 2.0 (#27346)
Signed-off-by: simon-mo <simon.mo@hey.com>
2022-08-02 15:07:41 -05:00
Richard Liaw
c8561071f3
[air/train/docs] gbdt trainer user guide (#27362)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2022-08-02 13:02:42 -07:00
clarng
84674fa868
[docs] ray core namespace docs: edit pass & move python code into doc_code dir (#27341) 2022-08-02 12:52:30 -07:00
clarng
34385b8136
[docs] ray core cross-lang docs: edit pass & move python code into doc_code dir (#27350)
Edit pass. Move code into doc_code dir. Code in doc_code is verified by CI
2022-08-02 12:50:05 -07:00
Jiajun Yao
cd2e590567
Support placement_group=None in PlacementGroupSchedulingStrategy (#27370)
We decided to allow escaping the parent pg via `PlacementGroupSchedulingStrategy(placement_group=None)` instead of using "DEFAULT". Our doc is updated with that but in the code it's still not allowed.
2022-08-02 12:49:41 -07:00
Eric Liang
a1cb735035
Raise the (runtime_env max size) gRPC max message size to 500MiB
Signed-off-by: Eric Liang <ekhliang@gmail.com>
2022-08-02 12:41:34 -07:00
Jun Gong
61add8ede6
[RLlib] Fix the last cartpole-crashing premerge test. (#27315) 2022-08-02 20:08:33 +02:00
Nikita Vemuri
9a0b9918e5
[dashboard] Add last_activity_at field to /api/component_activities (#27284)
Add optional last_activity_at field to /api/component_activities to record end time of most recently finished activity

Signed-off-by: Nikita Vemuri <nikitavemuri@gmail.com>
2022-08-02 11:02:15 -07:00
kourosh hakhamaneshi
bda5026428
[RLlib] Fix A2C release tests (#27314) 2022-08-02 10:44:52 -07:00
kourosh hakhamaneshi
8d848890f1
[RLlib] Fix default view_requirement in policy.py (#27255) 2022-08-02 10:44:07 -07:00
Ricky Xu
82a24f9319
[Doc][Core][State Observability] Adding Python SDK doc and docstring (#26997)
1. Add doc for python SDK and docstrings on public SDK
2. Rename list -> ray_list and get -> ray_get for better naming 
3. Fix some typos 
4. Auto translate address to api server url.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2022-08-02 11:24:59 -05:00
Kai Fricke
d527c7b335
[air/benchmarks] Drop OMP_NUM_THREADS in vanilla torch/tf training (#27256)
Ray automatically sets OMP_NUM_THREADS=1, potentially limiting multithreading in native pytorch/tensorflow. If this leads to performance differences, we should address this either in Ray Train or in Ray core.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-02 13:38:01 +01:00
xwjiang2010
36cf1baa82
[air doc] checkpoint_freq --> checkpoint_frequency (#27325)
Signed-off-by: xwjiang2010 <xwjiang2010@gmail.com>
2022-08-02 11:34:10 +01:00
Kai Fricke
149c031c4b
[tune/release] Do not use spot instances in k8s tests (#27250)
Spot instances are not being booted up, so let's go without them.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-08-02 11:30:41 +01:00
Yi Cheng
a9697722cf
[workflow] Change step to task in workflow. (#27330)
* change step to task

Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>

* fix comments

Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>

* fix comments

Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>

* fix comments

Signed-off-by: Yi Cheng <74173148+iycheng@users.noreply.github.com>
2022-08-01 22:27:41 -07:00
Jules S. Damji
4045ba4841
[DOC Ray AIR] minor editorial tweaks for clarity and usage (#27128)
Co-authored-by: Jules Damji <jules@anyscale.com>
2022-08-01 21:09:04 -07:00
Dmitri Gekhtman
6efca71c35
[docs][kubernetes] XGBoost ML example (#27313)
Adds a guide on running an XGBoost-Ray workload using KubeRay.

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2022-08-01 19:30:41 -07:00
Yi Cheng
00d22b6c7c
[core] Fix the test_failure_3.py in win (#27332)
Win tests were broken because when the child is killed, the parent is also killed. Change the signal sent and make it work.
2022-08-01 18:55:07 -07:00
shrekris-anyscale
324d8e4bca
[Serve] Serialize user_config with JSON instead of Pickle (#26235) 2022-08-01 17:53:43 -07:00