Commit graph

8203 commits

Author SHA1 Message Date
Sven Mika
c9d220bcda
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00
Chris Bamford
1e3721ef4a
[RLlib] Remove bad spinlocks to allow pytorch GPU scheduler to interrupt. (#16162) 2021-06-01 16:40:28 +02:00
Alex Wu
de0f856b68
[namespaces] Isolation for named placement groups (#16000) 2021-06-01 05:50:19 -07:00
SangBin Cho
bfa8ebcae9
[Test] Fix flaky global gc test (#16154)
* fast global gc to fix flaky test

* lint
2021-06-01 00:17:03 -07:00
Chris K. W
31364ed9b4
[autoscaler] Autoscaler metrics (#16066)
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-05-31 22:27:45 -07:00
Amog Kamsetty
da6f28d777
[Release] Add multi-node, multi-GPU SGD release test (#16046) 2021-05-31 16:23:04 -07:00
SangBin Cho
9fa3b9f6f3
[Nightly test] Test non streaming shuffle (#16150) 2021-05-31 15:28:02 -07:00
qicosmos
45d2331d5a
[C++ Woker] Remove ray core dependency completely (#16108) 2021-05-31 15:39:18 +08:00
Chong-Li
d5d0072635
Refactor RayletBasedActorScheduler (#16018) 2021-05-31 15:28:00 +08:00
SongGuyang
17b5f4dcaa
[C++ worker] support config from RayConfig and command line(gflag) (#16086) 2021-05-31 11:56:02 +08:00
zhuangzhuang131419
0429882bbf
[autoscaler] Implement node provider for aliyun (#15712)
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: zhuang <zhengchicheng.zcc@alibaba-inc.com>
Co-authored-by: chenk008 <kongchen28@gmail.com>
Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-05-29 00:56:32 -07:00
Amog Kamsetty
38b657cb65
[Tune] Place remote tune.run on node running the client server (#16034)
* force placement on persistent node

* address comments

* doc
2021-05-28 18:32:57 -07:00
Amog Kamsetty
cfa2997b86
[XGBoost] Add test with Ray Client (#16103) 2021-05-28 16:13:06 -07:00
Sven Mika
5fe34862ce
[RLlib] DDPG torch GPU bug. (#16133) 2021-05-28 22:09:25 +02:00
Ian Rodney
5ca1b297e4
[RayClient][Proxy] BugFixes (#16040) 2021-05-28 10:24:48 -07:00
Ian Rodney
ec46794767
[Client] Add ray.client().disconnect() (#16021) 2021-05-28 10:15:44 -07:00
Lixin Wei
3d37e3a315
[Refactor] Replace FractionalResourceQuantity with FixedPoint (#16052)
* refactor

* fix

* fix compilation

* fix

* fix cross-platform compilation

* lint

* fix test

* Revert "fix test"

This reverts commit 0ff23b125ce4159b91cc170dbc17b5ed70c9ab11.

* change rounding to truncating

* Update BUILD.bazel

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-05-28 09:32:51 -07:00
Sven Mika
33a69135cb
[RLlib] Issue 16117: DQN/APEX torch not working on GPU. (#16118) 2021-05-28 09:12:53 +02:00
Eric Liang
7f2e16fe8f
Make the spill test on unstable filesystem not so verbose (#16119)
* less logs

* update

* update
2021-05-27 20:32:48 -07:00
architkulkarni
cc7cc4fb9f
[Core] Allow specifying runtime_env conda and pip via filepath (#16073) 2021-05-27 17:58:47 -05:00
Clark Zinzow
a8ac383760
Decrease the number of nodes and actors started on each node in test_actor_multiple_gpus_from_multiple_tasks. (#16124) 2021-05-27 15:58:20 -07:00
Clark Zinzow
cd71d5e8ac
[Test] Ignore psutil.AccessDenied when gathering per-process memory info upon an OOM. (#16123) 2021-05-27 15:40:44 -07:00
Eric Liang
9c73591a4e
Revert "Fix tracing bug when actors are defined before connecting to … (#16120)
This reverts commit 6c1ea66611.
2021-05-27 11:50:36 -07:00
Amog Kamsetty
5d3cb295bd
[Tune] Add find_free_port Tune util (#16098) 2021-05-27 11:27:28 -07:00
Edward Oakes
90a76ad558
[Serve] use placement group by default (#16113) 2021-05-27 11:03:29 -07:00
SangBin Cho
d0dc9abdfc
[Plasma store] Improve the OOM logging message. (#16051) 2021-05-27 10:09:58 -07:00
Yi Cheng
5d0b302121
[core] Trigger global gc when plasma store is under pressure. (#15775) 2021-05-27 10:07:59 -07:00
Tao Wang
881e4913f1
Don't broadcast empty resources data (#16104) 2021-05-27 10:06:32 -07:00
Kathryn Zhou
6c1ea66611
Fix tracing bug when actors are defined before connecting to cluster (#16069) 2021-05-27 09:28:11 -07:00
architkulkarni
65eab8f376
Revert "Revert "[Core] Add "env_vars" field to runtime_env"" (#16107) 2021-05-27 10:16:33 -05:00
SangBin Cho
94dc06d852
[Nightly test] improve error detection (#16102)
* improve error detection

* improve gitignore

* fix
2021-05-27 00:33:21 -07:00
DK.Pino
ea0ee86063
[Placement Group]Fix actor scheduling with Placement Group bug. (#16006) 2021-05-26 22:16:38 -07:00
Ian Rodney
69d0e8e4fe
[Docs][ClientBuilder] Add ray.client() and ray.ClientBuilder to Experimental API docs (#16058) 2021-05-26 21:05:47 -07:00
SongGuyang
a4c108e5f6
[C++ worker] delete unuseful test (#16082) 2021-05-27 11:23:59 +08:00
architkulkarni
7cfe7f840c
Revert "[Core] Add "env_vars" field to runtime_env (#16075)" (#16099)
This reverts commit 1e245005c9.
2021-05-26 16:27:04 -07:00
Eric Liang
2f4628fdb4
Fix CHECK_FAIL when scheduling task with duplicate object requests (#16063) 2021-05-26 15:13:16 -07:00
Stephanie Wang
55bb1e93b4
[core] Wait for objects to be sealed before throwing OutOfMemory (#15955)
* Wait for objects to seal

* x

* comments

* error code
2021-05-26 14:18:32 -07:00
Eric Liang
3d1ba4a70e
Add feature flag for plasma overcommit (#16061) 2021-05-26 10:53:57 -07:00
architkulkarni
1e245005c9
[Core] Add "env_vars" field to runtime_env (#16075) 2021-05-26 12:11:19 -05:00
qicosmos
bbb61d0c00
[C++ Worker] remove core.h in api (#16079)
* remove core.h in api

* remove unused code and header

* remove core.h and some depencencies

* fix
2021-05-26 20:52:21 +08:00
qicosmos
498da13944
[C++ worker] Impove cpp worker (#15907) 2021-05-26 16:45:56 +08:00
qicosmos
d8f58e683f
[C++ worker] Add c++ worker log (#16015) 2021-05-26 16:13:02 +08:00
Kai Yang
853d650e29
Revert "Revert "[Object spilling] Avoid worker crash when an object is spille… (#15964)" (#16012)
This reverts commit 29aa336a4d.
2021-05-25 23:48:24 -07:00
Ian Rodney
3dbdd4eb46
[Client][Proxy] Track Num Clients in the proxy (#16038) 2021-05-25 22:17:43 -07:00
SongGuyang
7c3874b38e
remove id.h dependence for c++ worker headers (#16055) 2021-05-26 11:56:24 +08:00
Richard Liaw
08de5a36e1
[Horovod] Test with Ray Client (#15996)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-05-25 20:21:58 -07:00
Simon Mo
2aee4ac40d
[Serve] Cleanup examples and tests (#16042) 2021-05-25 15:32:36 -07:00
Xiang Xu
ec8b591f32
[docs] typo fix on the Doc for helm (#16036) 2021-05-25 12:59:39 -07:00
Sven Mika
e61922c4ac
[RLlib] Add one-liner to docs on internship/RL-engineer position. (#16050)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-05-25 12:58:54 -07:00
Ian Rodney
113fd6e765
[Client][Proxy] Refactor RayClient Proxy to not use additional Threads. (#16057) 2021-05-25 10:07:19 -07:00