Commit graph

9580 commits

Author SHA1 Message Date
Sven Mika
a96dbd885b
[RLlib] Reinstate trajectory view API tests. (#18809) 2021-09-23 08:31:51 +02:00
Guyang Song
237a2ade76
[wheel][cpp] recover cpp extra (#18597) 2021-09-23 12:10:03 +08:00
Amog Kamsetty
d354161528
[SGD] Link ray.sgd namespace to ray.util.sgd.v2 (#18732)
* wip

* add symlink

* update

* remove from init

* no require tune

* try fix

* change

* * import

* fix docs

* address comment
2021-09-22 18:49:41 -07:00
mwtian
e41109a5e7
[Client] Use async rpc for remote call and actor creation (#18298)
* Use async rpc for remote calls, task and actor creations.

* fix

* check placement

* check placement group. wait for id in destructor

* fix

* fix exception in destructor

* Add test

* revert change

* Fix comment

* fix
2021-09-22 18:30:50 -07:00
Yi Cheng
8dd3057644
Revert "[test] add unit test for PR #17634 (#18585)" (#18830)
This reverts commit 73c3cff18b.
2021-09-22 16:51:02 -07:00
Amog Kamsetty
00dd190df9
[SGD] Retry sgd.local_rank() (#18824)
* finish

* fix

* wip

* address comment

* update

* fix test

* fix failing test

* address comments

* fix test

* fix
2021-09-22 15:48:38 -07:00
Yi Cheng
73c3cff18b
[test] add unit test for PR #17634 (#18585) 2021-09-22 14:39:30 -07:00
gjoliver
e6511bcf56
Revert "Upgrade default bazel installation to ver 4.2.1 (#18714)" (#18825) 2021-09-22 13:54:48 -07:00
Amog Kamsetty
d9b166252b
Revert "[SGD] sgd.local_rank" (#18822) 2021-09-22 13:50:00 -07:00
Amog Kamsetty
42c925ca0a
[Docs] Fix ray[default] Wheel install instruction (#18819) 2021-09-22 12:53:08 -07:00
Sven Mika
93208bb087
[RLlib] Increase size of (very flakey) action_masking example script test. (#18816) 2021-09-22 21:48:01 +02:00
Clark Zinzow
a3f40236d0
[Repo Config] Allow blank issues. (#18800) 2021-09-22 11:39:59 -07:00
Chen Shen
9b1cd5d1ad
Disable spill test on macOS (#18801) 2021-09-22 09:57:53 -07:00
Amog Kamsetty
39bcbe03bc
[SGD] sgd.local_rank (#18686)
* finish

* fix

* wip

* address comment

* update

* fix test

* fix failing test

* address comments

* fix test
2021-09-22 08:10:49 -07:00
Kai Fricke
bbb207c36e
[sgd/v1] Add API annotations (#18790)
* [sgd/v1] Add API annotations

* Remove unnecessary annotations
2021-09-22 08:10:28 -07:00
Sven Mika
5611150b1a
Increase rllib stress tests timeout for smoke test (#18810) 2021-09-22 14:30:42 +01:00
Qing Wang
3ad1553b34
[Java] Remove API setJvmOptions(String). (#18664) 2021-09-22 20:00:49 +08:00
Kai Fricke
2cbf326410
[ci/release] store buildkite artifacts on buildkite (#18712) 2021-09-22 11:35:59 +01:00
Kai Fricke
f86fc277d6
[tune/rllib] Only disable ipython in remote actors (#18789) 2021-09-22 11:05:06 +01:00
gjoliver
eb3620898c
Upgrade default bazel installation to ver 4.2.1 (#18714) 2021-09-22 00:24:41 -07:00
Eric Liang
cf0bd00cc2
Improve the error message for failed task/actor imports on workers (#18792) 2021-09-21 19:49:59 -07:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669) 2021-09-21 22:00:14 +02:00
Antoni Baum
3106fc5365
[tune] Depreciate max_concurrent in TuneBOHB (#18770)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-09-21 19:17:19 +01:00
architkulkarni
aa6625e62a
[Serve] gate __del__ call behind hasattr check (#18773) 2021-09-21 10:48:40 -07:00
Antoni Baum
f4666f3a6d
[tune] Add on_trial_result to ConcurrencyLimiter (#18766) 2021-09-21 15:30:02 +01:00
Antoni Baum
ca3fabc4cb
[tune] Ensure arguments passed to tune remote_run match (#18733) 2021-09-21 15:29:29 +01:00
Yi Cheng
fc6a739e4b
[nightly] Deflaky nightly test many_nodes_actor_test (#18582) 2021-09-20 22:43:48 -07:00
Clark Zinzow
0704b825ff
[Datasets] Add spread resource prefix for manual round-robin resource-based task load balancing. (#18776) 2021-09-20 22:41:11 -07:00
Eric Liang
361a13602c
Actor repr for log prefix should be computed after init, not before (#18749) 2021-09-20 21:34:53 -07:00
DK.Pino
d329101469
Revert Revert "[Placement Group] Support infeasible placement groups for Placement Group." (#18735)
* fix conflict

* cxx lint
2021-09-20 20:18:12 -07:00
Yi Cheng
07babd807c
Revert "Revert "[core] Async submitting actor registerring (#18009)" (#18719)" (#18722) 2021-09-20 19:17:00 -07:00
Ameer Haj Ali
9efbd80733
[core] avoid scheduling on gpu nodes by default (#18743)
* [core] avoid scheduling on gpu nodes by default

* Fix cluster_task_manager_test tests.

Made most tests in cluster_task_manager_test not use GPU on the head
node.

Also added another test to scheduling_policy_test.

Co-authored-by: Sasha Sobol <sasha@asobol.com>
2021-09-20 17:38:40 -07:00
Sasha Sobol
65c1c8bb9e
Add an integration test for scheduler_avoid_gpu_nodes (#18763) 2021-09-20 17:20:42 -07:00
Jiao
9bb4a87031
[runtime_env] Add experimental job yaml (#18768) 2021-09-20 18:00:25 -05:00
Stephanie Wang
eafe6d5c79
Fix ref counting assertion check (#18752)
* Fix assertion crash

* test, lint

* todo

* x
2021-09-20 15:16:19 -07:00
Kai Fricke
cee18152f1
[tune] Remove deprecated features, promote warnings to errors (#18595) 2021-09-20 22:54:28 +01:00
gjoliver
5b6d69d61a
Minor change to switch result checking order so there is no artificial delay. (#18555)
Co-authored-by: Jun Gong <jungong@mbpro.local>
2021-09-20 22:49:17 +01:00
Simon Mo
29f89d8af7
[Serve] Doc: Mock ray.serve.generated package for doc building (#18767) 2021-09-20 14:33:33 -07:00
Kai Fricke
2e99fb215f
[tune] Cache unstaged placement groups for potential re-use (#18706) 2021-09-20 20:23:35 +01:00
Sven Mika
e6aae61487
[RLlib; testing] Fix bug in stress tests not handling >1 trials per experiment (due to grid-search in IMPALA stress tests). (#18705) 2021-09-20 15:31:57 +02:00
Ian Rodney
8d6ddcee53
[GCP] Add conda to the path when possible. (#18653) 2021-09-19 23:06:48 -07:00
Eric Liang
85aaca8d45
Update the contribution guide / style guide (#18753) 2021-09-19 20:14:51 -07:00
Chen Shen
b321abc560
[Core] fix another thread safety issue in instrumented_io_context 2021-09-19 17:44:31 -07:00
Eric Liang
2fa9648ef0
Revert "add integration test for gpu scheduling/avoidance (#18729)" (#18754)
This reverts commit 57edc0c607.
2021-09-19 17:05:05 -07:00
Dmitri Gekhtman
ffe533b297
[autoscaler] Log ips and ids when terminating nodes, code structure (#18180)
* recovery failure uses same termination function

* More cleanup

* More cleanup

* ips

* wip

* wip

* wip

* Fix tests

* tweak
2021-09-19 18:44:38 -04:00
Chen Shen
35aa944ef4
Fix thread-safety in global state accessor (#18746) 2021-09-19 12:01:31 -07:00
xwjiang2010
5551cdac19
[Tune] Break from loop after warning msg is logged. (#18720) 2021-09-18 16:33:44 -07:00
mwtian
32f71765e9
[Client] Allow Client{Object,Actor}Ref to accept a future. (#18677)
* Allow Client{Object,Actor}Ref to accept a future. Check number of args and returns synchronously.

* rename callback, fix
2021-09-18 16:32:02 -07:00
Eric Liang
d6ff390858
Task failure should not log error (#18742) 2021-09-18 13:26:32 -07:00
Sasha Sobol
57edc0c607
add integration test for gpu scheduling/avoidance (#18729) 2021-09-18 01:32:18 -07:00