Commit graph

6951 commits

Author SHA1 Message Date
Ian Rodney
0ec9ddabc1
[docker/dashboard] Fix ray dashboard (#12899) 2021-01-15 10:03:01 -08:00
Simon Mo
dac8b3d58a
[CI] Enable Dashboard tests for master (#13425) 2021-01-15 09:43:34 -08:00
SangBin Cho
f6d9996874
[Object Spilling] Dedup restore objects (#13470)
* done.

* Addressed code review.
2021-01-14 23:51:11 -08:00
fangfengbin
ce1b208e41
[GCS]Remove unused class variable (#13454) 2021-01-15 14:48:18 +08:00
Barak Michener
84e110a949
[ray_client]: Support runtime_context as metadata (#13428) 2021-01-14 14:37:00 -08:00
Clark Zinzow
9a658b568f
[Core] Ownership-based Object Directory: Consolidate location table and reference table. (#13220)
* Added owned object reference before Plasma put on Create() + Seal() path.

* Consolidated location table and reference table in reference counter.

* Restore type in definition.

* Clean up owned reference on failed Seal().

* Added RemoveOwnedObject test for reference counter.

* Guard against ref going out of scope before location RPCs.

* Add 'owner must have ref in scope' precondition to documentation for object location methods.

* Move to separate Create() + Seal() methods for existing objects.

* Clearer distinction between Create() and Seal() methods.

* Make it clear that references will normally be cleaned up by reference counting.
2021-01-14 13:48:10 -08:00
Siyuan (Ryans) Zhuang
d1e9887be2
[Serialization] New custom serialization API (#13291)
* new serialization API with doc & test

* add more notes

* refine notes

* doc
2021-01-14 13:15:31 -08:00
Amog Kamsetty
07e97fe4c2
[xgb] re-enable xgboost_ray tests (#13416)
* re-enable

* fix

* update xgb_ray version
2021-01-14 22:14:44 +01:00
Edward Oakes
7ba87b8abe
Fix getting runtime context dict in driver (#13417) 2021-01-14 14:41:53 -06:00
Ian Rodney
411e37ce3f
[serve] Properly obey SERVE_LOG_DEBUG=0 (#13460) 2021-01-14 12:24:22 -08:00
Simon Mo
16e8c4a69f
[Release] Fix Serve release test (#13303)
The Docker image we were using now uses `ray` users so we have to call
sudo.
2021-01-14 12:23:53 -08:00
Simon Mo
321bbe1ffb
[Dashboard] Fix GPU resource rendering issue (#13388) 2021-01-14 12:23:21 -08:00
PENG Zhenghao
e63da54931
[docs] Add more guideline on using ray in slurm cluster (#12819)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: PENG Zhenghao <pengzh@ie.cuhk.edu.hk>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-14 12:17:53 -08:00
Sven Mika
d98235cc84
[RLlib] Deflake 2x remote & local inference tests (external env). (#13459) 2021-01-14 20:44:26 +01:00
Micah Yong
c89ebdd94a
[Core][CLI] ray status and ray memory no longer starts a new job (#13391)
* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Make status and error args required in commands.py#debug.status

* Remove unnecessary imports

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Access memory info in ray memory via GlobalStateAccessor rather than calling ray.init()

* Modify ray status cli so that it doesn't start a new job via ray.init()

* Remove local test file

* Make status and error args required in commands.py#debug.status

* Remove unnecessary imports

* Job 38482.1 should now pass

* Resolve merge conflict
2021-01-14 10:12:16 -08:00
Dmitri Gekhtman
2d772a5a6d
[kubernetes][minor] Operator garbage collection fix (#13392) 2021-01-14 10:40:15 -06:00
Barak Michener
9c6d892eec
[ray_client]: fix exceptions raised while executing on the server on behalf of the client (#13424) 2021-01-14 10:38:01 -06:00
Ameer Haj Ali
2f7ba25efb
[joblib] joblib strikes again but this time on windows (#13212) 2021-01-14 10:36:52 -06:00
fangfengbin
4a6c53da46
[Core]Fix raylet scheduling bug (#13452)
* [Core]Fix raylet scheduling bug

* fix lint error

* fix lint error

Co-authored-by: 灵洵 <fengbin.ffb@antgroup.com>
2021-01-14 14:50:32 +01:00
Sven Mika
56878221ed
[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). (#13363) 2021-01-14 14:44:33 +01:00
fangfengbin
33b092de28
[GCS]Add gcs resource scheduler (#13072) 2021-01-14 20:05:55 +08:00
Kai Fricke
b296642646
Fix linter error (#13451) 2021-01-14 10:28:44 +01:00
Amog Kamsetty
560299972c
Revert "Enable Ray client server by default (#13350)" (#13429)
This reverts commit 912d0cbbf9.
2021-01-13 21:28:54 -08:00
fyrestone
8697d67791
Fix raylet::MockWorker::GetProcess crashes (#13440)
Co-authored-by: 刘宝 <po.lb@antfin.com>
2021-01-14 12:19:21 +08:00
dHannasch
ad015cb7df
Split out the part of get_node_ip_address for which the docstring is correct (#12796) 2021-01-14 11:32:56 +08:00
Amog Kamsetty
3f42e6bafe
[Tune] Pin Transitive Dependencies (#13358) 2021-01-13 19:10:21 -08:00
Tao Wang
062b7efc93
Remove unused handler methods (#13394) 2021-01-14 10:51:31 +08:00
Eric Liang
602c103eae
Make request_resources() use internal kv instead of redis pub sub (#13410) 2021-01-13 17:30:43 -08:00
Edward Oakes
9ef48b16b6
[serve] Pull out goal management logic into AsyncGoalManager class (#13341) 2021-01-13 18:35:25 -06:00
Edward Oakes
c6fc7124d1
[tune] Fix f-string in error message (#13423) 2021-01-13 18:34:21 -06:00
Simon Mo
b257cb7d98
Add bazel logs upload to GHA (#13251) 2021-01-13 15:17:11 -08:00
Simon Mo
15501a4151
Fix Serve release test (#13385) 2021-01-13 15:06:23 -08:00
Dmitri Gekhtman
1968b2f9d8
[autoscaler/k8s] [CI] Kubernetes test ray up, exec, down (#12514) 2021-01-13 15:03:56 -08:00
Simon Mo
44acbdd82a
[Serve] [Doc] Improve batching doc (#13389) 2021-01-13 14:39:42 -08:00
Eric Liang
6de5711690
Plumb retries update (#13411) 2021-01-13 13:49:57 -08:00
Barak Michener
8f48c64507
[ray_client]: Fix multiple attempts at checking connection (#13422) 2021-01-13 13:36:01 -08:00
fyrestone
4853aa96cb
[Dashboard] Fix missing actor pid (#13229) 2021-01-13 16:45:12 +08:00
Barak Michener
0b22341bc9
[ray_client]: Wait for ready and retry on ray.connect() (#13376)
* [ray_client]: wait until connection ready

Change-Id: Ie443be60c33ab7d6da406b3dcaa57fbb7ba57dd6

* lint

Change-Id: I30f8e870bbd5f8859a9f11ae244e210f077cedd0

* docs and retry minimum

Change-Id: I43f5378322029267ddd69f518ce8206876e2129d
2021-01-13 00:19:15 -08:00
Sven Mika
d49c3fae0b
[RLlib] Trajectory View API: Atari framestacking. (#13315) 2021-01-13 08:53:34 +01:00
Eric Liang
912d0cbbf9
Enable Ray client server by default (#13350)
* update

* fix

* fix test

* update
2021-01-12 21:31:01 -08:00
Simon Mo
8e0a2f669b
[Doc] Remove trailing whitespaces (#13390) 2021-01-12 20:35:38 -08:00
Tao Wang
f587b9a50c
Remove unimplemented GetAll method in actor info accessor (#13362) 2021-01-13 09:55:27 +08:00
SangBin Cho
0428537d0b
[Object Spilling] Long running object spilling test (#13331)
* done.

* formatting.
2021-01-12 16:53:13 -08:00
Amog Kamsetty
4d83003992
trigger doc build for serve updates (#13373) 2021-01-12 13:08:55 -08:00
Ian Rodney
2e70743077
[Serve] Backend state unit tests (#13319) 2021-01-12 14:54:04 -06:00
Maltimore
3a3e4aed86
[RLlib] Add __len__() method to SampleBatch (#13371) 2021-01-12 20:15:23 +01:00
architkulkarni
e560933f9c
[Serve] Add dependency management support for driver not running in a conda env (#13269) 2021-01-12 09:57:15 -08:00
Kai Fricke
518427627b
[tune] buffer trainable results (#13236)
* Working prototype

* Pass buffer length, fix tests

* Don't buffer per default

* Dispatch and process save in one go, added tests

* Fix tests

* Pass adaptive seconds to train_buffered, stop result processing after STOP decision

* Fix tests, add release test

* Update tests

* Added detailed logs for slow operations

* Update python/ray/tune/trial_runner.py

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Apply suggestions from code review

* Revert tests and go back to old tuning loop

* nit

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-12 18:52:47 +01:00
Amog Kamsetty
9eebd090cf
[Dependabot] [CI] Re-configure Dependabot and disable duplicate builds (#13359) 2021-01-12 09:28:58 -08:00
Kai Fricke
25f10a947a
Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. (#13339)" (#13361)
This reverts commit e2b2abb88b.
2021-01-12 12:33:57 +01:00