Commit graph

7973 commits

Author SHA1 Message Date
Simon Mo
5f0be94989
[Buildkite] Use the build link for Travis Tracker (#15317) 2021-04-14 18:58:23 -07:00
SangBin Cho
d0e83c43ca
[Release Test] Modify parameter to reduce stress (#15048)
* Fix.

* Fix.
2021-04-14 18:27:20 -07:00
SangBin Cho
e0bbfaf87e
[Log] Fix log monitor issue. (#15302) 2021-04-14 18:11:24 -07:00
Yi Cheng
0caf96be94
Take care of failed killing request (#15313) 2021-04-14 18:07:10 -07:00
Charles
82e730078f
[autoscaler] Converting assert False into useful exceptions. (#15306) 2021-04-14 16:16:37 -07:00
Simon Mo
c4b1985a5b
[Serialization] Pydantic -> serialization_addons.py and Ray Client support. (#15181) 2021-04-14 15:21:13 -07:00
Simon Mo
5289690d1c
[Buildkite] Fix Bazel Logs Upload (#15285) 2021-04-14 12:47:31 -07:00
SangBin Cho
775deca5ad
Revert "[runtime_env] Add support of exclusion (#15241)" (#15303)
This reverts commit 359b5ce06b.
2021-04-14 11:58:53 -07:00
Richard Liaw
59bf3a7b22
ray[cluster] -> ray[default] (#15251) 2021-04-14 09:37:04 -07:00
Antoni Baum
b93bd9bef4
[tune] Set correct Optuna TrialState on trial complete (#15283) 2021-04-14 15:59:23 +01:00
Sven Mika
bbfa8ffec9
[RLlib] Minor release 1.3 warnings cleanups. (#15272) 2021-04-14 14:03:15 +02:00
Sven Mika
ef0f163d16
[RLlib] Discussion 1709: IMPALA (tf and torch) reports sum of entropy (over batch) in stats. Should report mean instead. (#15290) 2021-04-14 11:44:25 +02:00
Kai Fricke
aaa14d63a7
[tune] deflake test_convergence, add seed parameter to OptunaSearch (#15248)
* De-flake optuna convergence test

* Even higher threshold

* Add `seed` parameter to OptunaSearch
2021-04-14 01:06:49 -07:00
wanxing
0ad0839265
Optimize lambda copy to improve direct call performance. (#15036) 2021-04-14 11:02:49 +08:00
Edward Oakes
4ed7a14e23
[serve] Support normal args and kwargs for deployments (#15172) 2021-04-13 16:20:50 -05:00
Richard Liaw
f4b2dd94b2
[tune] Cache MNIST and restore MNIST tests (#15260) 2021-04-13 14:20:26 -07:00
Simon Mo
7c734c207a
[Buildkite] Upload Bazel Logs to Bucket (#15259) 2021-04-13 14:16:42 -07:00
Yi Cheng
359b5ce06b
[runtime_env] Add support of exclusion (#15241) 2021-04-13 15:55:12 -05:00
Ian Rodney
d145ad94e4
[Client] Add metadata to Terminate Calls to make ray.kill() and ray.cancel() work (#15221) 2021-04-13 23:24:54 +03:00
Ian Rodney
ec3d5f2ef1
[client] Handle ray.put failures (#15229) 2021-04-13 11:23:16 -07:00
Edward Oakes
0f9d1bb223
Serve failure release test fix (#15276)
This test is currently not tested in CI
2021-04-13 17:49:29 +01:00
Sven Mika
5254d2fb36
[RLlib] Support parallelizing evaluation and training (optional). (#15040) 2021-04-13 09:53:35 +02:00
Clark Zinzow
05d99c9432
[dask-on-ray] Don't leak a global enabling of client mode in Dask callback test. (#15257)
* Don't leak a global enabling of client mode in Dask callback test.

* Enable and disable client_mode_enabled, not _client_hook_enabled.
2021-04-12 22:00:30 -07:00
SangBin Cho
9197552802
Temporarily disable flaky tests. (#15250) 2021-04-12 14:09:07 -07:00
Simon Mo
8bf4b37877
[Hotfix] Pin dm-tree package version (#15249)
`dm-tree` released a new version https://pypi.org/project/dm-tree/#history
and it depends on `bazel` to build from source, and it conflicts with
our current bazel setup (this conflict is non-trivial to fix).
2021-04-12 13:50:49 -07:00
Clark Zinzow
95659987a4
[Core] Event loop instrumentation - manual instrumentation hooks, instrumentation for deadline timer and local stream socket. (#15144)
* Added manual hooks in event loop instrumentation.

* Added instrumentation of the deadline timer in the periodical runner.

* Added instrumentation of the local stream socket in the ClientConnection.

* Addressed feedback except for opaque handle.

* Switch to opaque stats handle API.

* Add opaque stats handle destructor check to ensure that RecordExecution is called.

* Revert "Add opaque stats handle destructor check to ensure that RecordExecution is called."

This reverts commit 62cf8fca670d78c1160f0a9526b6cbe6e3a25725.

* Apply suggestions from code review

Co-authored-by: SangBin Cho <rkooo567@gmail.com>

* Other feedback, fixes for code suggestions.

* Prevent handler stats from leaking queueing stats when handler execution is never recorded.

* Enable event loop instrumentation.

* Revert "Enable event loop instrumentation."

This reverts commit df90c504e45e1963dc2ef6c3197dc5c965bc19e7.

* Reorg GCS client and IO context member fields to prevent use-after-free.

Co-authored-by: SangBin Cho <rkooo567@gmail.com>
2021-04-12 13:39:00 -07:00
Ameer Haj Ali
580b2bb9cc
[metrics.Histogram] improve error message (#15240) 2021-04-12 21:28:21 +03:00
Tao Wang
4c9eee609c
Revert "Revert "[GCS]Increase heartbeat interval to reduce pressure o… (#15207)
* Revert "Revert "[GCS]Increase heartbeat interval to reduce pressure on gcs server (#14203)" (#15194)"

This reverts commit a9ac4ad890.

* optimize wait condition to avoid flakey test

* remove unnecessary sleep
2021-04-12 10:45:42 -07:00
Richard Liaw
56c95075d1
Revert "[tune] enable mnist test v3 (#15198)" (#15242)
This reverts commit d913f32126.
2021-04-12 09:27:55 -07:00
Hao Chen
10ff2f3b4a
Fix duplicate destruction of CoreWorkerProcess instance (#15245) 2021-04-12 21:01:21 +08:00
qicosmos
e54dfd8cc5
[C++ worker] Ray actor task for RAY_REMOTE (#15039) 2021-04-12 15:40:35 +08:00
Sven Mika
9c5a0cfd7a
[RLlib] Issue 14385: Policy.compute_actions_from_input_dict does not properly track accessed fields for Policy's view requirements. (#14386) 2021-04-11 18:20:04 +02:00
Sven Mika
dfc116ea27
[RLlib] Discussion 681: Metrics prepends newest episodes instead of appending. (#15236) 2021-04-11 15:31:43 +02:00
Sven Mika
1c9701e9cb
[RLlib] Discussion 1513: on_episode_step() callback called after very first reset (should not). (#15218) 2021-04-11 13:16:17 +02:00
Sven Mika
b267f1f1ba
[RLlib] Add support for Int-Box action spaces. (#15012) 2021-04-11 13:16:01 +02:00
Clark Zinzow
1b62e9f844
[dask-on-ray][client] Support ClientObjectRefs in the Dask-on-Ray scheduler. (#15237)
* Support ClientObjectRefs in the Dask-on-Ray scheduler.
2021-04-11 10:44:38 +03:00
Qing Wang
0f444b1a59
Fix unexpected error when handling the process that has exited in memory monitor. (#14932)
Co-authored-by: Qing Wang <jovany.wq@antgroup.com>
2021-04-11 00:57:10 +08:00
Richard Liaw
0136ae10f8
[tune] run new test (#15119)
* add-runtest

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* ok

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-04-10 01:12:51 -07:00
chenk008
6709560ef6
fix setproctitle break /proc/PID/environ (#15056)
* fix setproctitle break /proc/PID/environ

* bugfix

* add ut

* fix ut

* fix ut

* fix ut

* improve comment

* improve comment

* fix ut lint

* fix ut lint

* revert init.py

Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-04-09 15:45:19 -07:00
Sam O
be62444bc5
[Log monitor] Resolves the stacktrace (#15199) 2021-04-09 11:32:04 -07:00
Jayce Li
b2f9c48647
doc: Fix ray serve in Deploying on Kubernetes (#15208) 2021-04-09 09:08:05 -07:00
Siyuan (Ryans) Zhuang
af9e38fd1c
Cloudpickle workaround for false positive cases (#15202)
* Cloudpickle workaround for false positive cases in '_is_parametrized_type_hint'.

* update comments
2021-04-09 02:22:46 -07:00
Richard Liaw
d913f32126
[tune] enable mnist test v3 (#15198) 2021-04-09 00:10:12 -07:00
Dmitri Gekhtman
58fbb419ea
[client][rllib] Add client_mode_hook for ray.get_gpu_ids (#15185) 2021-04-08 23:36:11 -07:00
Eric Liang
268409b6ad
updat warning (#15200) 2021-04-08 17:56:52 -07:00
Stephanie Wang
94e592004e
Prioritize worker requests for objects over queued task arguments (#15157) 2021-04-08 14:51:21 -07:00
Dmitri Gekhtman
4289fa8d43
[kubernetes][autoscaler][test] Kubernetes scale tests (#15133) 2021-04-08 11:42:53 -07:00
Eric Liang
982558a4d3
Update ray client protocol version (#15184) 2021-04-08 11:38:48 -07:00
SangBin Cho
a88d20729a
[Test] Skip TestConvergenceOptuna temporarily (#15197)
* Skip flaky tune test temporarily.

* Lint.
2021-04-08 11:36:10 -07:00
Edward Oakes
06f0c0b6a2
[serve] Remove test_api.py::test_shard_key (#15195) 2021-04-08 10:50:17 -07:00