Commit graph

7168 commits

Author SHA1 Message Date
Hao Chen
e1a5e5bad4
Fix test_actor_restart (#13901) 2021-02-05 14:08:43 -08:00
Simon Mo
4a3dd6858d
Buildkite determine-to-run support (#13866) 2021-02-05 12:58:07 -08:00
Amog Kamsetty
f44f368eae
[Tune] Add try-except to FailureInjectorCallback (#13939) 2021-02-05 11:02:42 -08:00
Eric Liang
f782ed59a0
Ray client version check strict eq (#13926) 2021-02-05 00:06:10 -08:00
fyrestone
eee624cf5f
Revert "Fix passing env on windows (#13253)" (#13828) 2021-02-05 13:03:16 +08:00
fangfengbin
8a5999c12a
[GCS]Fix bug that gcs client does not set last_resource_usage_ (#13856) 2021-02-05 11:51:25 +08:00
DK.Pino
fb89f9c2c8
[Placement Group] Support named placement group (#13755) 2021-02-05 11:04:51 +08:00
Dmitri Gekhtman
40bad86c7a
[hotfix][test][windows] Exclude k8s operator mock test from build. (#13924) 2021-02-04 18:35:10 -08:00
Kathryn Zhou
982c606b86
Add more user-friendly error message upon async def remote task (#13915) 2021-02-04 18:33:33 -08:00
architkulkarni
e89bbcbd44
[Serve] Revert "Revert "[Serve] Fix ServeHandle serialization"" and disable failing Windows test (#13771) 2021-02-04 14:50:01 -08:00
Edward Oakes
7af0c999f3
[serve] Built-in support for imported backends (#13867) 2021-02-04 15:09:12 -06:00
Dmitri Gekhtman
db59736b1a
[autoscaler][kubernetes] Add ability to not copy cluster config to head node when calling create_or_update_head_node. (#13720)
* Add option to skip bootstrapping head node autoscaling config

* don't close remote config before copying

* Type

* Type hints etc.

* test

* Test CR to config conversion

* comment
2021-02-04 10:30:03 -08:00
Kai Fricke
1e113d2e6e
[tune/xgboost] Update release test docs (#13880)
* Update release test docs

* Update
2021-02-04 13:10:56 +01:00
Richard Liaw
6c77aeb98a
[docs] ray slack remove banners (#13898)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-02-04 01:14:34 -08:00
Richard Liaw
0fc81e2393
[tune] fix gpu check (#13825)
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-02-04 01:13:58 -08:00
Eric Liang
e79a380a7e
Check in shuffle code as experimental (#13899) 2021-02-04 00:24:16 -08:00
Clark Zinzow
243f678ffd
Fall back to random port instead of default port for non-primary Redis shards; attempt to cluster Redis shard ports close to each other. (#13847) 2021-02-03 22:00:15 -08:00
Alex Wu
a13208f113
Scalability envelope readme typo (#13874) 2021-02-03 21:43:45 -08:00
Tao Wang
44aa9c173f
Rename timeout to period with heartbeat interval (#13872) 2021-02-04 10:37:28 +08:00
Tao Wang
e0d9c8f0a8
Always replace DEL with UNLINK (#13832) 2021-02-04 10:30:00 +08:00
Dmitri Gekhtman
1187d1dd3e
[autoscaler][kubernetes][operator] Rudimentary error handling, make "MODIFIED" -> update event work. (#13756) 2021-02-03 20:07:11 -06:00
Eric Liang
e8fce9f1f3
Check Ray client protocol version (#13886)
* wip

* wip

* fix tests
2021-02-03 16:44:09 -08:00
Clark Zinzow
407302f93a
[Core] Ownership-based Object Directory - Changed infinite short-poll location subscription to long-poll. (#13841) 2021-02-03 14:16:42 -08:00
SangBin Cho
cb9fa90203
[Object Spilling] Add consumed bytes to detect thrashing. (#13853) 2021-02-03 14:16:26 -08:00
Barak Michener
77ee2c569f
[ray_client] convert things registered for ray into ray_client (#13639) 2021-02-03 13:30:05 -08:00
Alex Wu
f14171ced9
[Core] Put raylet ip's in resource usage report (#13871)
* .

* done?

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-02-03 11:28:56 -08:00
Gabriele Oliaro
79310452e7
Enabling the cancellation of non-actor tasks in a worker's queue 2 (#13244)
* wrote code to enable cancellation of queued non-actor tasks

* minor changes

* bug fixes

* added comments

* rev1

* linting

* making ActorSchedulingQueue::CancelTaskIfFound raise a fatal error

* bug fix

* added two unit tests

* linting

* iterating through pending_normal_tasks starting from end

* fixup! iterating through pending_normal_tasks starting from end

* fixup! fixup! iterating through pending_normal_tasks starting from end

* post merge fixes

* added debugging instructions, pulled Accept() out of guarded loop

* removed debugging instructions, linting

* first commit

* lint

* lint

* added hack to avoid race condition in test stress

* moved hack

* fix test cancel

* removed hack (hopefully no longer needed)

* Revert "removed hack (hopefully no longer needed)"

This reverts commit 99d0e7c91539f290700f50aaaed805dcde04a5ee.

* added sleep in mock_worker.cc

* sleep function fixup to work on windows

* sleep in test_fast both for force=true and force=false

* linting

Co-authored-by: Ian <ian.rodney@gmail.com>
2021-02-03 10:20:12 -08:00
Haoyuan Ge
875ea3fe1d
[docs] Update actors.rst (#13873)
Add "ray.get" when calling the actor method.
2021-02-03 09:51:53 -08:00
Edward Oakes
a695c651ee
[serve] Small cleanups for BackendState (#13870) 2021-02-03 11:46:25 -06:00
Ameer Haj Ali
2a903b904a
[joblib] Log once the context warning argument. (#13865)
Co-authored-by: Ameer Haj Ali <ameerhajali@ameers-mbp.lan>
Co-authored-by: Alex Wu <alex@anyscale.io>
Co-authored-by: Alex Wu <itswu.alex@gmail.com>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Ameer Haj Ali <ameerhajali@Ameers-MacBook-Pro.local>
2021-02-03 00:23:20 -08:00
Eric Liang
d335ce2aab
Move the tune driver into a remote task (#13778) 2021-02-02 18:41:45 -08:00
fangfengbin
b4684cf37a
Fix bug that otal_commands_queued_ is not initialized (#13852) 2021-02-03 10:00:15 +08:00
architkulkarni
c8e1f07c52
remove starlette install instruction (#13869) 2021-02-02 14:37:55 -08:00
architkulkarni
32fc649f39
[serve] Add example code for custom status code response (#13868) 2021-02-02 16:30:45 -06:00
Edward Oakes
fc956e084a
[Hotfix] Lint (#13864) 2021-02-02 12:56:50 -08:00
James
863c1b8282
Add podman support (#13633) 2021-02-02 11:09:43 -08:00
Sven Mika
9ac731558b
[RLlib] Unify fcnet initializers for the value output layer (std=1.0 in torch, but 0.01 in tf). (#13733) 2021-02-02 18:42:49 +01:00
Sven Mika
0a0d9183fe
[RLlib] Trajectory view API example script (enhancements and tf2 support). (#13786) 2021-02-02 18:42:18 +01:00
Edward Oakes
a6138ca31f
[serve] Support batches for ImportedBackends (#13843) 2021-02-02 09:44:01 -06:00
Kai Fricke
d29fcfb45c
[tune] catch SIGINT signal and trigger experiment checkpoint (#13767)
* [tune] catch SIGINT signal and trigger experiment checkpoint

* Apply suggestions from code review

* Fix user guide docs

* Update doc/source/tune/user-guide.rst
2021-02-02 14:52:09 +01:00
Stanislav Chekmenev
b9c15a2551
[RLlib] Issue #13761: Fix get action shape (#13764) 2021-02-02 13:13:43 +01:00
Raoul Khouri
714c367b9d
[RLlib] Trainer._validate_config idempotentcy correction (issue 13427) (#13556) 2021-02-02 13:11:57 +01:00
QuantumMecha
0c93bb77cb
[RLlib] Update Documentation for Curiosity's support of continuous actions (#13784)
Only (Multi)Discrete action spaces are supported so far according to https://github.com/ray-project/ray/blob/master/rllib/utils/exploration/curiosity.py
2021-02-02 13:10:09 +01:00
Sven Mika
52c94b7ee9
[RLlib] Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (#13522) 2021-02-02 13:05:58 +01:00
Eric Liang
fa4290090d
Add Ray client protocol version (#13846) 2021-02-02 00:19:08 -08:00
Eric Liang
26beb3b67b
Revert "Revert "Enable Ray client server by default (#13350)" (#13429)" (#13442)
* Revert "Revert "Enable Ray client server by default (#13350)" (#13429)"

This reverts commit 560299972c.

* fix job id collision with ray client server
2021-02-02 00:17:29 -08:00
Eric Liang
88ab887cc4
Unconditionally retry all RPC errors on client connect (#13845)
* wip

* Update python/ray/util/client/worker.py

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>

Co-authored-by: fangfengbin <869218239a@zju.edu.cn>
2021-02-02 00:10:35 -08:00
Eric Liang
d71eeac2d6
remove lru evict docs (#13849) 2021-02-02 00:07:47 -08:00
SangBin Cho
886217c333
[Object Spilling] Skip normal ray.get path when spilling objects. (#13831) 2021-02-01 16:03:34 -08:00
Eric Liang
e4d30430c0
Fix naming of ray_spilled_objects directory 2021-02-01 15:46:40 -08:00