Commit graph

6254 commits

Author SHA1 Message Date
Stephanie Wang
427b5af0ae
[Object spilling] Refactor raylet to add a local object manager class (#11647)
* Fix pytest...

* Release objects that have been spilled

* GCS object table interface refactor

* Add spilled URL to object location info

* refactor to include spilled URL in notifications

* improve tests

* Add spilled URL to object directory results

* Remove force restore call

* Merge spilled URL and location

* fix

* tmp

* refactor

* unit test skeleton

* unit testing

* unit test fixes

* cleanup

* cleanup

* update

* Separate pinning from waiting for object free, fixes pytest

* Update src/ray/raylet/local_object_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Tyler Westenbroek <westenbroekt@berkeley.edu>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-28 10:38:42 -04:00
Richard Liaw
70ea1fbe30
[sgd] pin ptl to 1.0.3 (#11664)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-28 00:29:01 -07:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
fangfengbin
55a090fb16
[GCS]Optimize gcs client nodes get function (#11424)
* [GCS]Optimize gcs client nodes get function

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 21:13:19 -07:00
yncxcw
c3e246818a
[Core] Fix doc string for ray.init() (#11657) 2020-10-27 18:27:22 -07:00
Tao Wang
273a712786
[GCS]Decouple node failure detector with resoure related operations (#11465) 2020-10-27 15:52:42 -07:00
Ameer Haj Ali
1c40950877
[autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (#11600) 2020-10-27 15:33:11 -07:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes (#11613)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw
293483ed0b
[k8s][minor] fix error handling (#11653)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:24:07 -07:00
Ian Rodney
3ce852d345
[docker] Synchronize Torch for Tune & RLlib (#11637) 2020-10-27 18:37:25 +01:00
fangfengbin
ebe9a8865c
[GCS]Fix a bug that creates invalid connection (#11590)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 10:08:06 -07:00
Sven Mika
d9f1874e34
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
Jack Parker-Holder
e7aafd7d24
[tune] PB2 (#11466)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 01:03:21 -07:00
Edward Oakes
349c3ec86b
Remove errant "self" argument to NodeProvider static method 2020-10-26 22:22:41 -07:00
Simon Mo
fe4a78b7c7
[Hotfix] Pin Pydantic Version (#11622) 2020-10-26 16:52:19 -07:00
Kai Fricke
1a1ff28d18
[tune] allow tune search spaces to be passed to search algorithms (#11503) 2020-10-26 12:33:13 -07:00
Richard Liaw
4ad8af9b0d
[tune] More PTL example cleanup (#11585) 2020-10-26 12:26:14 -07:00
Richard Liaw
b02e61f672
[minor] fix up docs (#11596)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-26 12:19:03 -07:00
Ian Rodney
2da6ad2176
[core] Better error message for named actor not found (#11604) 2020-10-26 09:46:02 -07:00
Tao Wang
0fbee4da0c
[GCS] Remove unused ReportBatchHeartbeat/SubscribeHeartbeat (#11567)
* Remove unused message ReportBatchHeartbeat

* add up
2020-10-25 21:06:28 -07:00
Sumanth Ratna
11f1bbf03c
[tune] use isinstance instead of type for TBXLogger (#11595) 2020-10-25 16:12:44 -07:00
Richard Liaw
1b357533b1
[tune] Try to enable PTL, SKlearn tests (#11542) 2020-10-24 01:08:46 -07:00
Eric Liang
d3ee83205b
Remove crashing assert in actor creation for old scheduler (#11577)
* remove assert

* warn log
2020-10-24 00:05:26 -07:00
Siyuan (Ryans) Zhuang
5ad5cb61ca
Remove outdated numpy serializer (#11587) 2020-10-23 22:58:05 -07:00
Raoul Khouri
c3c72db69b
[tune] fixed validation for search metrics (#11583)
* fixed validation for search metrics

* formatting

* made error report better

* if only one metric is missing extract it from list

* any can take a generator
2020-10-23 17:04:21 -07:00
Clark Zinzow
0979589c7c
[dask-on-ray] Convert tuple of object refs to list before ray.get() call. (#11582) 2020-10-23 16:39:22 -07:00
Ian Rodney
d3405e74da
[autoscaler] SDK fixes (#11517)
* [autoscaler] SDK Fxies

* add docs

* remove all_nodes
2020-10-23 14:09:47 -07:00
Ian Rodney
aef96d17bf
[yaml] HotFix for correct example full (#11584) 2020-10-23 15:55:07 -05:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00
Kai Fricke
8ee4f7eca3
[tune] fix pbt ptl example (#11573)
* [tune] fix pbt ptl example

* wider smoke test
2020-10-23 12:42:13 -07:00
Ian Rodney
7a0184e081
[docker] Push to DockerHub in CI (#11442) 2020-10-23 12:02:15 -07:00
architkulkarni
1ce0c4965b
[Serve] Update front page of serve doc (#11421) 2020-10-23 12:01:04 -07:00
DK.Pino
9f804ade5f
[Placement Group]Add get all placement group api (#11460)
* add get all interface for placement group

* add get all interface for placement group

* make it work

* fix lint

* fix lint

* fix comment

* add cpp test

* fix python lint
2020-10-23 11:46:48 -07:00
Richard Liaw
e7aa6441b7
[tune] a tiny ptl example (#11497) 2020-10-22 18:50:34 -07:00
Barak Michener
4348ecf850
Clean up release tests (#11420) 2020-10-22 17:04:41 -07:00
Gekho457
2d1f52c21c
[autoscaler] Removed .cleanup() from NodeProvider and commands.py (#11543) 2020-10-22 14:46:49 -07:00
dHannasch
47531ac7e6
Resolve Issue #11556 by changing the docs to reference _temp_dir. (#11562) 2020-10-22 16:24:46 -05:00
Frank Gu
73fa94731f
[tune] Add HDFS as Cloud Sync Client (#11524) 2020-10-22 14:12:51 -07:00
Eric Liang
083737c63c
Deprecate rsync to all nodes (#11563) 2020-10-22 13:45:42 -07:00
Amog Kamsetty
d87c186721
[RaySGD] Docs for SGD+Tune usage (#11479) 2020-10-22 13:32:27 -07:00
Kingsley Kuan
d1dd5d578e
[RLlib] Fix PyTorch A3C / A2C loss function using mixed reduced sum / mean (#11449) 2020-10-22 12:39:34 -07:00
Allen
cf2ee94e0c
[Autoscaler] Allow users to set the names for security groups created by ray (#11405) 2020-10-22 12:28:59 -07:00
Simon Mo
7111a424af
[Serve] Add regression test for #11437 (#11539) 2020-10-22 10:45:18 -07:00
Alex Wu
d1182b827a
[Autoscaler] Do not count unmanaged nodes in load metrics (#11458)
* fixedd

* lint

* fixed other test case

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2020-10-21 22:14:21 -07:00
Max Fitton
44fb60b4dd
[hotfix] Pin node version (fix linux wheel build) (#11532)
Co-authored-by: Max Fitton <max@semprehealth.com>
2020-10-21 19:10:09 -07:00
Richard Liaw
af0fde4efd
[hotfix] disable sklearn again (#11541)
This reverts commit 9522918fa2.
2020-10-21 19:04:48 -07:00
Gekho457
155687e0c3
[autoscaler/AWS] Updated AWS Node Provider threading logic (#11422) 2020-10-21 18:42:38 -07:00
Philsik Chang
ede9347127
[rllib] Add torch_distributed_backend flag for DDPPO (#11362) (#11425) 2020-10-21 18:30:42 -07:00
Richard Liaw
a4b418d30c
[docs] update cloud docs (#11262)
* update-cloud-docs

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* Update doc/source/cluster/config.rst

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

* fix

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>

Co-authored-by: Ian Rodney <ian.rodney@gmail.com>
2020-10-21 16:37:26 -07:00
Alex Wu
e02f4c0157
[New scheduler] queue by shape (#11381) 2020-10-21 15:56:06 -07:00