Commit graph

7624 commits

Author SHA1 Message Date
Ian Rodney
857408874c
[Autoscaler][Azure] check if 'update' is available in (#14787) 2021-03-19 08:39:46 -07:00
Chris Bamford
cd89f0dc55
[RLLib] Episode media logging support (#14767) 2021-03-19 09:17:09 +01:00
Amog Kamsetty
47300d5a53
[SGD] Worker Startup Fault Tolerance (#14724) 2021-03-18 22:53:56 -07:00
Eric Liang
c30d5f445c
Nonblocking release for ray client to deflake tests (#14782)
* fix

* update

* fix
2021-03-18 21:49:36 -07:00
Ian Rodney
00aceaae37
[Client] Test Serialization in a platform independent way. (#14786) 2021-03-18 18:24:44 -07:00
Alex Wu
62214f1b80
Delete WIP in scalability envelope (#14791) 2021-03-18 17:53:53 -07:00
Amog Kamsetty
7ee2e4185b
[Tune] PTL Fractional GPUs (#14781)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 17:07:51 -07:00
Lixin Wei
9227a83b59
Print ERROR log of actor creation task (#14764) 2021-03-18 16:56:55 -07:00
Richard Liaw
ebc71339fe
[client] fix multi-threading bugs (#14701) 2021-03-18 16:25:55 -07:00
Dmitri Gekhtman
da56a863f9
[Kubernetes][autoscaler] Deep copy in K8s Node Provider to fix scaling issues (#14773) 2021-03-18 18:17:57 -05:00
Ian Rodney
0495d6af15
[autoscaler] fix azure config issues (#14750) 2021-03-18 16:00:25 -07:00
Yi Cheng
881a46e1d6
[core] RuntimeEnv GC in local node (#14594) 2021-03-18 14:55:11 -07:00
Ian Rodney
eb12033612
[Code Cleanup] Switch to use ray.util.get_node_ip_address() (#14741)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 13:10:57 -07:00
Sven Mika
c3a15ecc0f
[RLlib] Issue #13802: Enhance metrics for multiagent->count_steps_by=agent_steps setting. (#14033) 2021-03-18 20:27:41 +01:00
Richard Liaw
1d033fb552
[client] Fix serialization of RayTaskError (#14698) 2021-03-18 12:26:33 -07:00
Richard Liaw
8201e4ea11
[client] fix refcounting for named actors (#14753)
* max-workers

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 12:20:29 -07:00
Eric Liang
ef249c98b1
[flaky test] Fix test_cli by disabling config cache for dashboard test (#14755) 2021-03-18 12:02:25 -07:00
SangBin Cho
351540e17e
[Test] Fix flaky object spilling test linux (#14757)
* Fix.

* done.
2021-03-18 11:37:09 -07:00
Edward Oakes
90f5ebac72
[serve] Add backend_state tests for updating backend config (#14772) 2021-03-18 12:58:39 -05:00
Edward Oakes
de598149d1
[serve] Add tests for backend_state versioning (#14748) 2021-03-18 11:08:45 -05:00
Ian Rodney
971855a353
[Serve] Disable Final Standalone Test on Windows (#14761) 2021-03-18 09:26:55 -05:00
Tao Wang
5305dbb639
[large scale]Always disable sync/subscribe context in sharding context (#14706) 2021-03-18 19:31:36 +08:00
Edward Oakes
91308b9b52
[serve] Refactor to add basic unit tests for BackendState (#14740) 2021-03-17 22:35:28 -05:00
Tao Wang
44a7ce3d35
[large scale]Disable async/subscribe context in global state accessor (#14705) 2021-03-18 11:07:33 +08:00
Tao Wang
ea7c9171e9
[large scale]Disable async context in raylets' gcs client (#14704) 2021-03-18 10:50:09 +08:00
Ian Rodney
50e95ad6dd
[Serve] Disable More test::standalone on windows (#14751) 2021-03-17 16:51:02 -07:00
Clark Zinzow
6a28cf4add
[Core] Event loop instrumentation concurrency fixes. (#14719)
* Moved global stats member to a shared pointer explicitly captured by-value by handler lambdas, fixed handler stats copy outside of lock, ported to generalized lambda capture.

* Reenabled event loop instrumentation by default.

* Remove explicit inline specifier from non-member functions, move into anonymous namespace.

* Revert "Reenabled event loop instrumentation by default."

This reverts commit 949215269f79a1ab5ddc1ce0285c3ff4477ee6e0.
2021-03-17 16:49:25 -07:00
Michael Schock
42dcacd888
[k8s] Minor doc fix (#14732) 2021-03-17 16:15:38 -07:00
Edward Oakes
34b5781ae0
[serve] Add basic support for a declarative deploy() API call (#14720) 2021-03-17 16:00:23 -05:00
Edward Oakes
f2013a0586
[serve] Skip test_standalone::test_connect on windows (#14747) 2021-03-17 13:50:34 -07:00
Lixin Wei
72d87093b9
[Core] Make Actor DEAD and Save Exceptions in GCS When Error Happens in Constructor (#14211) 2021-03-17 12:50:28 -07:00
Alex Wu
534846a1d2
[Autoscaler] Track failed nodes (#14608) 2021-03-17 12:49:31 -07:00
Ian Rodney
99861f5302
[JAR Build] Prevent MacOS Jar Builds from Timing Out (#14738) 2021-03-17 12:05:37 -07:00
Siyuan (Ryans) Zhuang
6d346e74a6
cleanup python code (#14691)
* cleanup python code
2021-03-17 10:45:05 -07:00
Clark Zinzow
a86277a93c
[dask-on-ray] Fix Dask-on-Ray examples in docs (#14461) 2021-03-17 10:37:32 -07:00
Ian Rodney
10250d737f
[Autoscaler] Add tests around docker run options (#14713) 2021-03-17 10:13:51 -07:00
Edward Oakes
c781197755
[serve] Temporarily disable ray client test (#14733) 2021-03-17 08:48:05 -07:00
Edward Oakes
aab7ccc466
[serve] Deprecate client-based API in favor of process-wide singleton (#14696) 2021-03-17 09:39:54 -05:00
Sven Mika
69202c6a7d
[RLlib] Obsolete usage tracking dict via sample batch. (#13065) 2021-03-17 08:18:15 +01:00
Akash Patel
6e326cc239
upgrade setproctitle dep (#14538) 2021-03-16 21:58:36 -07:00
Ian Rodney
8a936ad64d
[Autoscaler Docs] Use worker_run_options (#14721)
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
2021-03-16 18:04:27 -07:00
Siyuan (Ryans) Zhuang
f30ac73640
update cloudpickle to commit 6e0f571 (#14693) 2021-03-16 12:36:43 -07:00
Ian Rodney
bd641a5e71
Revert "[Core] Added event loop metrics for posts. (#14546)" (#14692) 2021-03-16 10:38:45 -07:00
Edward Oakes
5a45e3351f
add Serve service by default (#14711) 2021-03-16 10:34:30 -07:00
Eric Liang
b240f5f0c9
Incremental refactor of runtime_env for consistency (#14632) 2021-03-16 10:11:50 -07:00
Sven Mika
78a134efa2
[RLlib] Add HowTo set env seed to our custom env example script. (#14471) 2021-03-16 08:12:27 +01:00
Tao Wang
897b84b300
[large scale]Add option for disable/enable context connection and disable asynchro… (#14596) 2021-03-16 15:09:13 +08:00
Edward Oakes
ae2c20c1ac
[serve] Include required and available resources in slow startup message (#14695) 2021-03-15 21:32:07 -05:00
Kathryn Zhou
01dda99b8c
Export cluster statistics to Prometheus (#14612) 2021-03-15 19:28:13 -07:00
Ian Rodney
d251bb676d
[Autoscaler] Get_Head_Node should return an up-to-date node (#14579) 2021-03-15 17:48:18 -07:00