Ian Rodney
4116c8c3f1
[ClientBuilder] Verify Module has ClientBuilder Class ( #16076 )
2021-06-02 09:19:44 -07:00
fyrestone
c53893cb13
[Dashboard] Reorganize dashboard modules - actor ( #16170 )
2021-06-02 06:58:30 -07:00
DK.Pino
9497a65a57
commit ( #16183 )
2021-06-02 06:50:04 -07:00
Ian Rodney
2e365f8797
[ClientBuilder] Code takes precedence over environment ( #16112 )
...
* no override address
* correct ordering
2021-06-02 13:10:15 +03:00
mwtian
f14f197d42
[Client] Make Client{ObjectRef,ActorRef} subclasses of their server-side counterparts ( #16110 )
...
* Implement ClientObjectRef and ClientActorID in cython
* fix doc
* Remove unnecessary declaration.
Add basic unit tests.
* Fix quotes.
* Skip tests on Windows
2021-06-01 23:45:41 +03:00
Amog Kamsetty
65f1d67e9c
[SGD] Ray Client Support and tests ( #16111 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 13:21:26 -07:00
Lixin Wei
113c7fdecc
[core] Fix ResourceMapToTaskRequest ( #16172 )
2021-06-01 12:20:03 -07:00
Travis Addair
050a076de9
[k8s] Refactored k8s operator to use kopf for controller logic ( #15787 )
...
Co-authored-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
2021-06-01 12:00:55 -07:00
Amog Kamsetty
ffef51b010
[Dependabot] Change PR creation time for rllib to PST ( #15995 )
2021-06-01 11:58:52 -07:00
Kai Fricke
153a8b8fec
[release] convert tune release tests ( #15913 )
2021-06-01 11:19:15 -07:00
matthewdeng
7637654557
[tune] populate internal configs when creating Trainable through DistributedTrainableCreator ( #16128 )
...
* [tune] populate internal configs when creating Trainable through DistributedTrainableCreator
* create DistributedTrainable class
* Fix tests and docs
* fix formatting
* Update python/ray/tune/trainable.py
* make call to DistributedTrainable explicit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 09:59:37 -07:00
Amog Kamsetty
04863d158a
[Tune] MLflow with Ray Client ( #16029 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-06-01 09:50:44 -07:00
Dmitri Gekhtman
27c2f570f1
[kubernetes] pin the K8s config yamls to ray:latest instead of ray1.3 ( #15988 )
2021-06-01 19:12:35 +03:00
Sven Mika
c9d220bcda
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. ( #16080 )
2021-06-01 17:39:18 +02:00
Chris Bamford
1e3721ef4a
[RLlib] Remove bad spinlocks to allow pytorch GPU scheduler to interrupt. ( #16162 )
2021-06-01 16:40:28 +02:00
Alex Wu
de0f856b68
[namespaces] Isolation for named placement groups ( #16000 )
2021-06-01 05:50:19 -07:00
SangBin Cho
bfa8ebcae9
[Test] Fix flaky global gc test ( #16154 )
...
* fast global gc to fix flaky test
* lint
2021-06-01 00:17:03 -07:00
Chris K. W
31364ed9b4
[autoscaler] Autoscaler metrics ( #16066 )
...
Co-authored-by: Ian <ian.rodney@gmail.com>
2021-05-31 22:27:45 -07:00
Amog Kamsetty
da6f28d777
[Release] Add multi-node, multi-GPU SGD release test ( #16046 )
2021-05-31 16:23:04 -07:00
SangBin Cho
9fa3b9f6f3
[Nightly test] Test non streaming shuffle ( #16150 )
2021-05-31 15:28:02 -07:00
qicosmos
45d2331d5a
[C++ Woker] Remove ray core dependency completely ( #16108 )
2021-05-31 15:39:18 +08:00
Chong-Li
d5d0072635
Refactor RayletBasedActorScheduler ( #16018 )
2021-05-31 15:28:00 +08:00
SongGuyang
17b5f4dcaa
[C++ worker] support config from RayConfig and command line(gflag) ( #16086 )
2021-05-31 11:56:02 +08:00
zhuangzhuang131419
0429882bbf
[autoscaler] Implement node provider for aliyun ( #15712 )
...
Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: zhuang <zhengchicheng.zcc@alibaba-inc.com>
Co-authored-by: chenk008 <kongchen28@gmail.com>
Co-authored-by: wuhua.ck <wuhua.ck@alibaba-inc.com>
2021-05-29 00:56:32 -07:00
Amog Kamsetty
38b657cb65
[Tune] Place remote tune.run on node running the client server ( #16034 )
...
* force placement on persistent node
* address comments
* doc
2021-05-28 18:32:57 -07:00
Amog Kamsetty
cfa2997b86
[XGBoost] Add test with Ray Client ( #16103 )
2021-05-28 16:13:06 -07:00
Sven Mika
5fe34862ce
[RLlib] DDPG torch GPU bug. ( #16133 )
2021-05-28 22:09:25 +02:00
Ian Rodney
5ca1b297e4
[RayClient][Proxy] BugFixes ( #16040 )
2021-05-28 10:24:48 -07:00
Ian Rodney
ec46794767
[Client] Add ray.client().disconnect() ( #16021 )
2021-05-28 10:15:44 -07:00
Lixin Wei
3d37e3a315
[Refactor] Replace FractionalResourceQuantity with FixedPoint ( #16052 )
...
* refactor
* fix
* fix compilation
* fix
* fix cross-platform compilation
* lint
* fix test
* Revert "fix test"
This reverts commit 0ff23b125ce4159b91cc170dbc17b5ed70c9ab11.
* change rounding to truncating
* Update BUILD.bazel
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2021-05-28 09:32:51 -07:00
Sven Mika
33a69135cb
[RLlib] Issue 16117: DQN/APEX torch not working on GPU. ( #16118 )
2021-05-28 09:12:53 +02:00
Eric Liang
7f2e16fe8f
Make the spill test on unstable filesystem not so verbose ( #16119 )
...
* less logs
* update
* update
2021-05-27 20:32:48 -07:00
architkulkarni
cc7cc4fb9f
[Core] Allow specifying runtime_env conda and pip via filepath ( #16073 )
2021-05-27 17:58:47 -05:00
Clark Zinzow
a8ac383760
Decrease the number of nodes and actors started on each node in test_actor_multiple_gpus_from_multiple_tasks. ( #16124 )
2021-05-27 15:58:20 -07:00
Clark Zinzow
cd71d5e8ac
[Test] Ignore psutil.AccessDenied when gathering per-process memory info upon an OOM. ( #16123 )
2021-05-27 15:40:44 -07:00
Eric Liang
9c73591a4e
Revert "Fix tracing bug when actors are defined before connecting to … ( #16120 )
...
This reverts commit 6c1ea66611
.
2021-05-27 11:50:36 -07:00
Amog Kamsetty
5d3cb295bd
[Tune] Add find_free_port Tune util ( #16098 )
2021-05-27 11:27:28 -07:00
Edward Oakes
90a76ad558
[Serve] use placement group by default ( #16113 )
2021-05-27 11:03:29 -07:00
SangBin Cho
d0dc9abdfc
[Plasma store] Improve the OOM logging message. ( #16051 )
2021-05-27 10:09:58 -07:00
Yi Cheng
5d0b302121
[core] Trigger global gc when plasma store is under pressure. ( #15775 )
2021-05-27 10:07:59 -07:00
Tao Wang
881e4913f1
Don't broadcast empty resources data ( #16104 )
2021-05-27 10:06:32 -07:00
Kathryn Zhou
6c1ea66611
Fix tracing bug when actors are defined before connecting to cluster ( #16069 )
2021-05-27 09:28:11 -07:00
architkulkarni
65eab8f376
Revert "Revert "[Core] Add "env_vars" field to runtime_env"" ( #16107 )
2021-05-27 10:16:33 -05:00
SangBin Cho
94dc06d852
[Nightly test] improve error detection ( #16102 )
...
* improve error detection
* improve gitignore
* fix
2021-05-27 00:33:21 -07:00
DK.Pino
ea0ee86063
[Placement Group]Fix actor scheduling with Placement Group bug. ( #16006 )
2021-05-26 22:16:38 -07:00
Ian Rodney
69d0e8e4fe
[Docs][ClientBuilder] Add ray.client()
and ray.ClientBuilder
to Experimental API docs ( #16058 )
2021-05-26 21:05:47 -07:00
SongGuyang
a4c108e5f6
[C++ worker] delete unuseful test ( #16082 )
2021-05-27 11:23:59 +08:00
architkulkarni
7cfe7f840c
Revert "[Core] Add "env_vars" field to runtime_env ( #16075 )" ( #16099 )
...
This reverts commit 1e245005c9
.
2021-05-26 16:27:04 -07:00
Eric Liang
2f4628fdb4
Fix CHECK_FAIL when scheduling task with duplicate object requests ( #16063 )
2021-05-26 15:13:16 -07:00
Stephanie Wang
55bb1e93b4
[core] Wait for objects to be sealed before throwing OutOfMemory ( #15955 )
...
* Wait for objects to seal
* x
* comments
* error code
2021-05-26 14:18:32 -07:00