Commit graph

1274 commits

Author SHA1 Message Date
niole
488f63efe3
[Dashboard] Make requests sent by the dashboard reverse proxy compatible (#14012) 2021-02-24 18:31:59 -08:00
SangBin Cho
be68a78b3f
[Object Spilling] Support multiple directories for spilling. (#14240)
* Finish the initial implementation.

* Improve the doc.

* Addressed comment.

* lint.

* f
2021-02-23 11:51:57 -08:00
Kai Fricke
757866ec01
[tune] enable placement groups per default (#13906)
* Refactor placement group factory object to accept placement_group arguments instead of callables

* Convert resources to pgf

* Enable placement groups per default

* Fix tests WIP

* Fix stop/resume with placement groups

* Fix progress reporter test

* Fix trial executor tests

* Check resource for trial, not resource object

* Move ENV vars into class

* Fix tests

* Sphinx

* Wait for trial start in PBT

* Revert merge errors

* Support trial reuse with placement groups

* Better check for just staged trials

* Fix trial queuing

* Wait for pg after trial termination

* Clean up PGs before tune run

* No PG settings in pbt scheduler

* Fix buffering tests

* Skip test if ray reports erroneous available resources

* Disable PG for cluster resource counting test

* Debug output for tests

* Output in-use resources for placement groups

* Don't start new trial on trial start failure

* Add docs

* Cleanup PGs once futures returned

* Fix placement group shutdown

* Use updated_queue flag

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs

* Reuse placement groups independently from actors

* Do not remove placement groups for paused trials

* Only continue enqueueing trials if it didn't fail the first time

* Rename parameter

* Fix pause trial

* Code review + try_recover

* Update python/ray/tune/utils/placement_groups.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Move placement group lifecycle management

* Move total used resources to pg manager

* Update FAQ example

* Requeue trial if start was unsuccessful

* Do not cleanup pgs at start of run

* Revert "Do not cleanup pgs at start of run"

This reverts commit 933d9c4c

* Delayed PG removal

* Fix trial requeue test

* Trigger pg cleanup on status update

* Fix tests

* Fix docs

* fix-test

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-23 18:46:02 +01:00
javi-redondo
0408fe6a69
Small improvements to the Ray Cluster docs (#14241)
* Small improvements to the Ray Cluster docs

* Update quickstart.rst

Changed title for quick start

Co-authored-by: Javier Redondo <javier@Anyscale-MacBook-Pro.local>
2021-02-23 13:44:28 +02:00
Simon Mo
f6a8a9be59
[Serve] Add RLlib tutorial (#14194) 2021-02-22 13:23:12 -08:00
Ryan Sander
8b5310a4e6
Fixed "multit-threaded" --> "multi-threaded" (#14236) 2021-02-21 19:25:51 -08:00
Dmitri Gekhtman
090970bdf5
[autoscaler] Max worker default infinity (#14201)
* random doc typo

* max-worker-default-inf

* fix

* -1 means infinity

* doc

* comment tweak

* fix random typo

* Cluster max-worker default

* fix

* typo

* test

* Git add the test

* doc-tweak

* rest of the test logistics

* periods in doc

* Address comments

* docstring
2021-02-22 05:14:00 +02:00
chaokunyang
f8a36eb350
[Java] Add java api overload doc and test (#14204) 2021-02-19 19:46:35 +08:00
Antoni Baum
58d7398246
[Tune] Add HEBOSearch Searcher (#13863)
* HEBO first pass

* Fix bad quotes

* Fixes

* Reproductibility

* Update python/ray/tune/suggest/hebo.py

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Add hebo_example.py to BUILD

* Nit

* Update to pypi package

* Alphabetical HEBO requirement

* Fix syntax error

* Fix wrong space in hebo example

* Move validate_warmstart to utils

* Space assertion in HEBO

* Comment

* Apply suggestions from code review

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Formatting

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-02-17 22:53:10 +01:00
Sumanth Ratna
c1d68d7dd0
[docs] Remove sphinx-gallery example runtimes (#14141)
e7f65d9b21/doc/conf.py (L340)
2021-02-17 11:07:16 -08:00
Alex Wu
753083c617
[docs][autoscaler] Update AWS node config link (#14125) 2021-02-17 10:44:10 -08:00
SangBin Cho
4d7ab3c886
[Doc] Ray logging document. (#14102)
* Initial draft done.

* Addressed code review.
2021-02-16 15:27:30 -08:00
Edward Oakes
019d84a9f3
[serve] [docs] High-level reorganization of the docs (#14120) 2021-02-16 14:07:56 -06:00
architkulkarni
0fb96a61fc
[Serve] Add support for variable routes (#13968) 2021-02-15 11:42:42 -06:00
architkulkarni
bcb51a27c6
[Serve] [Doc] Add version warning (#14001) 2021-02-15 11:16:01 -06:00
javi-redondo
b8b2d6410d
[docs] new Ray Cluster documentation (#13839)
Co-authored-by: Javier Redondo <javier@anyscale.com>
Co-authored-by: AmeerHajAli <ameerh@berkeley.edu>
2021-02-15 00:47:14 -08:00
Dmitri Gekhtman
6644a0fe50
[autoscaler][kubernetes][docs] Updated Kubernetes Documentation (#14016)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-11 23:00:25 -08:00
Amog Kamsetty
24e020b062
[Doc] Add PTL and RAG to community integrations (#14064) 2021-02-11 15:48:19 -08:00
Jeroen Boeye
2af1f0616d
Fix broken link to Flow docs (#14058) 2021-02-11 13:20:34 -08:00
SangBin Cho
cb8523a5e6
Fix the wrong spark on ray link. (#14057) 2021-02-11 12:31:18 -08:00
Clark Zinzow
c5574a33e4
[dask-on-ray] Add better Dask-on-Ray example, and detail custom shuffle optimization. (#13950)
* Add better Dask-on-Ray example, and detail custom shuffle optimization.

* Misc. updates and feedback.

* Update doc/source/dask-on-ray.rst

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>

* Set max_branch to infinity in shuffle optimization example.

* Feedback

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* 80 col width

Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-10 14:24:09 -08:00
Thomas J. Fan
75fbd48edd
[doc] Minor fix to indentation (#14040) 2021-02-10 12:31:47 -08:00
Alex Wu
68e985ddcd
[hotfix][docs] RayDP tensorflow != pytorch (#14044) 2021-02-10 11:23:02 -08:00
Alex Wu
ce80ef5aee
[Docs] RayDP Documentation (#14018)
* .

* done?

* Docs

* Docs

* Update raydp.rst

* Update raydp.rst

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-02-09 23:05:18 -08:00
SangBin Cho
0e07b5fa89
[Doc] Update actor resource information (#13909)
* in progress.

* Revert "in progress."

This reverts commit 21a91a47522797210bdc5db9477bd0b02ed9d926.

* done.

* done.
2021-02-08 10:23:57 -08:00
Devin Petersohn
1412f3c546
[docs] page for using Modin with Ray (#13937)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-06 00:28:04 -08:00
DK.Pino
fb89f9c2c8
[Placement Group] Support named placement group (#13755) 2021-02-05 11:04:51 +08:00
Kathryn Zhou
982c606b86
Add more user-friendly error message upon async def remote task (#13915) 2021-02-04 18:33:33 -08:00
Edward Oakes
7af0c999f3
[serve] Built-in support for imported backends (#13867) 2021-02-04 15:09:12 -06:00
Richard Liaw
6c77aeb98a
[docs] ray slack remove banners (#13898)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-04 01:14:34 -08:00
Dmitri Gekhtman
1187d1dd3e
[autoscaler][kubernetes][operator] Rudimentary error handling, make "MODIFIED" -> update event work. (#13756) 2021-02-03 20:07:11 -06:00
Haoyuan Ge
875ea3fe1d
[docs] Update actors.rst (#13873)
Add "ray.get" when calling the actor method.
2021-02-03 09:51:53 -08:00
architkulkarni
c8e1f07c52
remove starlette install instruction (#13869) 2021-02-02 14:37:55 -08:00
architkulkarni
32fc649f39
[serve] Add example code for custom status code response (#13868) 2021-02-02 16:30:45 -06:00
Kai Fricke
d29fcfb45c
[tune] catch SIGINT signal and trigger experiment checkpoint (#13767)
* [tune] catch SIGINT signal and trigger experiment checkpoint

* Apply suggestions from code review

* Fix user guide docs

* Update doc/source/tune/user-guide.rst
2021-02-02 14:52:09 +01:00
QuantumMecha
0c93bb77cb
[RLlib] Update Documentation for Curiosity's support of continuous actions (#13784)
Only (Multi)Discrete action spaces are supported so far according to https://github.com/ray-project/ray/blob/master/rllib/utils/exploration/curiosity.py
2021-02-02 13:10:09 +01:00
Sven Mika
52c94b7ee9
[RLlib] Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (#13522) 2021-02-02 13:05:58 +01:00
Eric Liang
26beb3b67b
Revert "Revert "Enable Ray client server by default (#13350)" (#13429)" (#13442)
* Revert "Revert "Enable Ray client server by default (#13350)" (#13429)"

This reverts commit 560299972c.

* fix job id collision with ray client server
2021-02-02 00:17:29 -08:00
Eric Liang
d71eeac2d6
remove lru evict docs (#13849) 2021-02-02 00:07:47 -08:00
SongGuyang
6e53a71978
bug fix for doc (#13834) 2021-02-01 21:13:43 +08:00
SongGuyang
361e5f0bef
support dynamic library loading in C++ worker (#13734) 2021-02-01 19:24:33 +08:00
Siyuan (Ryans) Zhuang
0b598c0f05
[Serialization] API for deregistering serializers; code & doc cleanup (#13471)
* make methods private, remove confusion brackets and usages

* unregister serializer; fix doc

* Cleanup doc

* rename unregister -> deregister
2021-01-29 10:27:05 -08:00
architkulkarni
cb771f263d
[Serve] Add ServeHandle metrics (#13640) 2021-01-28 14:40:47 -06:00
architkulkarni
28cf5f91e3
[docs] change MLFlow to MLflow in docs (#13739) 2021-01-27 16:53:15 -08:00
Eric Liang
eba698d48e
Remove docs for install-nightly (#13744) 2021-01-27 13:10:45 -08:00
DK.Pino
7f6d326ad8
[Placement Group]Add detached support for placement group. (#13582) 2021-01-27 18:51:26 +08:00
Edward Oakes
5d882b062d
[Serve] fix k8s doc (#13713) 2021-01-26 10:09:13 -08:00
Edward Oakes
1c77cc7e23
[docs] Remove API warning from mp.Pool (#13683) 2021-01-25 09:59:46 -08:00
Ameer Haj Ali
b7dd7ddb52
deprecate useless fields in the cluster yaml. (#13637)
* prepare for head node

* move command runner interface outside _private

* remove space

* Eric

* flake

* min_workers in multi node type

* fixing edge cases

* eric not idle

* fix target_workers to consider min_workers of node types

* idle timeout

* minor

* minor fix

* test

* lint

* eric v2

* eric 3

* min_workers constraint before bin packing

* Update resource_demand_scheduler.py

* Revert "Update resource_demand_scheduler.py"

This reverts commit 818a63a2c86d8437b3ef21c5035d701c1d1127b5.

* reducing diff

* make get_nodes_to_launch return a dict

* merge

* weird merge fix

* auto fill instance types for AWS

* Alex/Eric

* Update doc/source/cluster/autoscaling.rst

* merge autofill and input from user

* logger.exception

* make the yaml use the default autofill

* docs Eric

* remove test_autoscaler_yaml from windows tests

* lets try changing the test a bit

* return test

* lets see

* edward

* Limit max launch concurrency

* commenting frac TODO

* move to resource demand scheduler

* use STATUS UP TO DATE

* Eric

* make logger of gc freed refs debug instead of info

* add cluster name to docker mount prefix directory

* grrR

* fix tests

* moving docker directory to sdk

* move the import to prevent circular dependency

* smallf fix

* ian

* fix max launch concurrency bug to assume failing nodes as pending and consider only load_metric's connected nodes as running

* small fix

* deflake test_joblib

* lint

* placement groups bypass

* remove space

* Eric

* first ocmmit

* lint

* exmaple

* documentation

* hmm

* file path fix

* fix test

* some format issue in docs

* modified docs

* joblib strikes again on windows

* add ability to not start autoscaler/monitor

* a

* remove worker_default

* Remove default pod type from operator

* Remove worker_default_node_type from rewrite_legacy_yaml_to_availble_node_types

* deprecate useless fields

Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
Co-authored-by: root <root@ip-172-31-56-188.us-west-2.compute.internal>
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-01-23 12:06:51 -08:00
Ameer Haj Ali
1fbb752f42
[autoscaler] remove worker_default_node_type that is useless. (#13588) 2021-01-21 17:04:38 -08:00