Sven Mika
d98235cc84
[RLlib] Deflake 2x remote & local inference tests (external env). ( #13459 )
2021-01-14 20:44:26 +01:00
Micah Yong
c89ebdd94a
[Core][CLI] ray status
and ray memory
no longer starts a new job ( #13391 )
...
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()
* Modify ray status cli so that it doesn't start a new job via ray.init()
* Remove local test file
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()
* Modify ray status cli so that it doesn't start a new job via ray.init()
* Remove local test file
* Make status and error args required in commands.py#debug.status
* Remove unnecessary imports
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()
* Modify ray status cli so that it doesn't start a new job via ray.init()
* Remove local test file
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()
* Modify ray status cli so that it doesn't start a new job via ray.init()
* Remove local test file
* Make status and error args required in commands.py#debug.status
* Remove unnecessary imports
* Job 38482.1 should now pass
* Resolve merge conflict
2021-01-14 10:12:16 -08:00
Dmitri Gekhtman
2d772a5a6d
[kubernetes][minor] Operator garbage collection fix ( #13392 )
2021-01-14 10:40:15 -06:00
Barak Michener
9c6d892eec
[ray_client]: fix exceptions raised while executing on the server on behalf of the client ( #13424 )
2021-01-14 10:38:01 -06:00
Ameer Haj Ali
2f7ba25efb
[joblib] joblib strikes again but this time on windows ( #13212 )
2021-01-14 10:36:52 -06:00
fangfengbin
4a6c53da46
[Core]Fix raylet scheduling bug ( #13452 )
...
* [Core]Fix raylet scheduling bug
* fix lint error
* fix lint error
Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-01-14 14:50:32 +01:00
Sven Mika
56878221ed
[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). ( #13363 )
2021-01-14 14:44:33 +01:00
fangfengbin
33b092de28
[GCS]Add gcs resource scheduler ( #13072 )
2021-01-14 20:05:55 +08:00
Kai Fricke
b296642646
Fix linter error ( #13451 )
2021-01-14 10:28:44 +01:00
Amog Kamsetty
560299972c
Revert "Enable Ray client server by default ( #13350 )" ( #13429 )
...
This reverts commit 912d0cbbf9
.
2021-01-13 21:28:54 -08:00
fyrestone
8697d67791
Fix raylet::MockWorker::GetProcess crashes ( #13440 )
...
Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-01-14 12:19:21 +08:00
dHannasch
ad015cb7df
Split out the part of get_node_ip_address for which the docstring is correct ( #12796 )
2021-01-14 11:32:56 +08:00
Amog Kamsetty
3f42e6bafe
[Tune] Pin Transitive Dependencies ( #13358 )
2021-01-13 19:10:21 -08:00
Tao Wang
062b7efc93
Remove unused handler methods ( #13394 )
2021-01-14 10:51:31 +08:00
Eric Liang
602c103eae
Make request_resources() use internal kv instead of redis pub sub ( #13410 )
2021-01-13 17:30:43 -08:00
Edward Oakes
9ef48b16b6
[serve] Pull out goal management logic into AsyncGoalManager class ( #13341 )
2021-01-13 18:35:25 -06:00
Edward Oakes
c6fc7124d1
[tune] Fix f-string in error message ( #13423 )
2021-01-13 18:34:21 -06:00
Simon Mo
b257cb7d98
Add bazel logs upload to GHA ( #13251 )
2021-01-13 15:17:11 -08:00
Simon Mo
15501a4151
Fix Serve release test ( #13385 )
2021-01-13 15:06:23 -08:00
Dmitri Gekhtman
1968b2f9d8
[autoscaler/k8s] [CI] Kubernetes test ray up, exec, down ( #12514 )
2021-01-13 15:03:56 -08:00
Simon Mo
44acbdd82a
[Serve] [Doc] Improve batching doc ( #13389 )
2021-01-13 14:39:42 -08:00
Eric Liang
6de5711690
Plumb retries update ( #13411 )
2021-01-13 13:49:57 -08:00
Barak Michener
8f48c64507
[ray_client]: Fix multiple attempts at checking connection ( #13422 )
2021-01-13 13:36:01 -08:00
fyrestone
4853aa96cb
[Dashboard] Fix missing actor pid ( #13229 )
2021-01-13 16:45:12 +08:00
Barak Michener
0b22341bc9
[ray_client]: Wait for ready and retry on ray.connect() ( #13376 )
...
* [ray_client]: wait until connection ready
Change-Id: Ie443be60c33ab7d6da406b3dcaa57fbb7ba57dd6
* lint
Change-Id: I30f8e870bbd5f8859a9f11ae244e210f077cedd0
* docs and retry minimum
Change-Id: I43f5378322029267ddd69f518ce8206876e2129d
2021-01-13 00:19:15 -08:00
Sven Mika
d49c3fae0b
[RLlib] Trajectory View API: Atari framestacking. ( #13315 )
2021-01-13 08:53:34 +01:00
Eric Liang
912d0cbbf9
Enable Ray client server by default ( #13350 )
...
* update
* fix
* fix test
* update
2021-01-12 21:31:01 -08:00
Simon Mo
8e0a2f669b
[Doc] Remove trailing whitespaces ( #13390 )
2021-01-12 20:35:38 -08:00
Tao Wang
f587b9a50c
Remove unimplemented GetAll method in actor info accessor ( #13362 )
2021-01-13 09:55:27 +08:00
SangBin Cho
0428537d0b
[Object Spilling] Long running object spilling test ( #13331 )
...
* done.
* formatting.
2021-01-12 16:53:13 -08:00
Amog Kamsetty
4d83003992
trigger doc build for serve updates ( #13373 )
2021-01-12 13:08:55 -08:00
Ian Rodney
2e70743077
[Serve] Backend state unit tests ( #13319 )
2021-01-12 14:54:04 -06:00
Maltimore
3a3e4aed86
[RLlib] Add __len__()
method to SampleBatch ( #13371 )
2021-01-12 20:15:23 +01:00
architkulkarni
e560933f9c
[Serve] Add dependency management support for driver not running in a conda env ( #13269 )
2021-01-12 09:57:15 -08:00
Kai Fricke
518427627b
[tune] buffer trainable results ( #13236 )
...
* Working prototype
* Pass buffer length, fix tests
* Don't buffer per default
* Dispatch and process save in one go, added tests
* Fix tests
* Pass adaptive seconds to train_buffered, stop result processing after STOP decision
* Fix tests, add release test
* Update tests
* Added detailed logs for slow operations
* Update python/ray/tune/trial_runner.py
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Apply suggestions from code review
* Revert tests and go back to old tuning loop
* nit
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-12 18:52:47 +01:00
Amog Kamsetty
9eebd090cf
[Dependabot] [CI] Re-configure Dependabot and disable duplicate builds ( #13359 )
2021-01-12 09:28:58 -08:00
Kai Fricke
25f10a947a
Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. ( #13339 )" ( #13361 )
...
This reverts commit e2b2abb88b
.
2021-01-12 12:33:57 +01:00
Dmitri Gekhtman
7166949194
[Kubernetes][Docs] GPU usage ( #13325 )
...
* gpu-note
* gpu-note
* More info
* lint?
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Update doc/source/cluster/kubernetes.rst
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* GKE->Kubernetes
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-11 21:36:31 -08:00
Edwin Goh
a5ddc27bab
Fix typo in Tune Docs (Checkpointing) ( #13348 )
...
See issue #13299
2021-01-11 20:27:18 -08:00
Eric Liang
470fda190a
Forgot overwrite parameter in Ray client internal kv
2021-01-11 17:50:06 -08:00
Amog Kamsetty
0452a3a435
[Tune] Rename MLFlow to MLflow ( #13301 )
2021-01-11 17:36:55 -08:00
Eric Liang
de5bc24c60
Implement internal kv in ray client ( #13344 )
...
* kv internal
* fix
2021-01-11 14:54:52 -08:00
Eric Liang
fbb9795374
[client] Report number of currently active clients on connect ( #13326 )
...
* wip
* update
* update
* reset worker
* fix conn
* fix
* disable pycodestyle
2021-01-11 14:53:12 -08:00
Sven Mika
e2b2abb88b
[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. ( #13339 )
2021-01-11 22:42:30 +01:00
architkulkarni
c43fa12e73
[Serve] Support Starlette streaming response ( #13328 )
2021-01-11 13:27:44 -08:00
ZhuSenlin
c39658f368
fix removal of task dependencies ( #13333 )
...
Co-authored-by: senlin.zsl <senlin.zsl@antfin.com>
2021-01-11 09:55:48 -08:00
Edward Oakes
62e1ad3973
[serve] Cleanup backend state, move checkpointing and async goal logic inside ( #13298 )
2021-01-11 11:45:43 -06:00
Sven Mika
5d50d37f45
[RLlib] Issue 13330: No TF installed causes crash in ModelCatalog.get_action_shape()
( #13332 )
2021-01-11 13:19:46 +01:00
Edward Oakes
93006c2ba5
Use wait_for_condition to reduce flakiness in test_queue.py::test_custom_resources ( #13210 )
2021-01-10 19:32:59 -06:00
Barak Michener
6f0083ed10
add the method annotation and a comment explaining what's happening ( #13306 )
...
Change-Id: I848cc2f0beaed95340d9de7cca19a50c78d9da9a
2021-01-10 15:54:10 -08:00