Commit graph

6333 commits

Author SHA1 Message Date
Sven Mika
bfc4f95e01
[RLlib] Fix test_bc.py test case. (#11722)
* Fix large json test file.

* Fix large json test file.

* WIP.
2020-10-31 00:16:09 -07:00
Eric Liang
48dee789b3
Add random actor placement; fix cancellation callback; update test skips (#11684) 2020-10-30 18:36:35 -07:00
DK.Pino
b10871a1f5
[Core]Fix get workrer table bug (#11516)
* fix get_worker_table bug

* fix lint

* fix comment

* remove actor table

* fix comment

* fix get alive worker

* remove unused python import
2020-10-30 14:48:29 -07:00
SangBin Cho
71c5089854
[Object Spilling] Initial Iteration of S3 adapter. (#11379)
* Finished the first iteration.

* Removed unnecessary code.

* Smartopen impl.

* Make sure tests passed.

* Addressed code review.

* Addressed code review.

* Fix issues.

* Fix issues.
2020-10-30 14:47:07 -07:00
Ameer Haj Ali
7aade469d0
[autoscaler] fix the autoscaling bug for continuously launching failed nodes (#11714) 2020-10-30 14:12:06 -07:00
Gekho457
8816d34541
Kubernetes rsync verbosity fixed (#11716) 2020-10-30 14:03:42 -07:00
Alan Guo
3c109b45aa
Disable validation of cluster config on the cluster to allow for cluster configs with new properties. (#11693) 2020-10-30 14:02:00 -07:00
Eric Liang
f9f372c327
[autoscaler] Clean up monitoring loop code (#11677) 2020-10-30 13:48:43 -07:00
SangBin Cho
6e2a1eac36
[Placement Group] Placement group automatic cleanup. (#11546)
* In progress. Done with all placement group manager code.

* It is working with job.

* Finished detached actor implementation.

* Fix minor issue.

* In progress.

* Addressed code review.

* Addressed code review.

* Addressed code reivew.

* Fix a build error.
2020-10-30 10:55:43 -07:00
Alex Wu
5a83d8918a
[release] Do not tag docker latest on release builds (#11694)
* fix

* Added comment

Co-authored-by: Alex Wu <alex@anyscale.com>
2020-10-29 23:13:25 -07:00
Max Fitton
b4df42b027
[Dashboard] Make Infeasible Actor UX Less Scary (#11654)
* Update infeasible actor UI so that it only shows infeasible for an ActorClassGroup if at least one actor in the class is infeasible

* lint
2020-10-29 23:12:43 -07:00
Max Fitton
d6628cdbfb
[Dashboard] Fix null gpu utilization (#11650)
* update dashboard to work if GPU utilization field is missing from GPU payload

* lint

* lint
2020-10-29 23:11:50 -07:00
Alex Wu
e022d12dc3
[New scheduler] Deflake test heartbeat (#11586)
* defleked

* lint

* .

* Update cluster_task_manager_test.cc

Co-authored-by: Alex Wu <alex@anyscale.com>
2020-10-29 23:10:19 -07:00
architkulkarni
4175569d96
[Core] Add option to override environment variables for tasks and actors (#11619) 2020-10-29 14:22:44 -05:00
Simon Mo
e82ff08b0c
Fix asyncio plasma integration in cluster mode (#11665) 2020-10-29 11:53:10 -07:00
Lingxuan Zuo
0b7a3d9e02
[Log] new spdlog tool for ray (#10967)
* spdlog support

* fatal abort for spdlog

* print all logs in stderr if no logger given

* fix log test

* install signal handler for spdlog by reusing glog lib

* fix lint

* Avoid duplicated dump

* log rotation and fmt comments

* fix
2020-10-29 11:37:13 -07:00
Ian Rodney
87e971bff0
[docker] Include python k8s package in ray-deps (#11703) 2020-10-29 10:57:23 -07:00
Yutai Zhou
6999db93cb
Un-indent multiagent section (#11310)
* Un-indent multiagent section
MARL section used to be nested inside bandits, which we probably don't want. Maybe give it its own section instead?
2020-10-29 16:12:48 +01:00
Jiajie Xiao
0b07af374a
allow tuple action space (#11429)
Co-authored-by: Jiajie Xiao <jj@Jiajies-MBP-2.attlocal.net>
2020-10-29 16:05:38 +01:00
Barak Michener
91fa7e0b4e
[releng]: Quiet Docker Push (and explain why) (#11623) 2020-10-29 00:18:51 -07:00
Simon Mo
46afec5660
Mute asyncio warning for Serve (#11682) 2020-10-28 17:05:42 -07:00
huyz-git
64e3c9741a
Update rllib-algorithms.rst (#11642) 2020-10-28 15:07:10 -07:00
mvindiola1
9e68b77796
[RLLIB] Wait for remote_workers to finish closing environments before terminating (#11476) 2020-10-28 14:23:06 -07:00
Edward Oakes
fcaf4d80e3
[serve] Make fractional resource usage more obvious in docs (#11580) 2020-10-28 13:54:36 -07:00
Kai Fricke
ba63ded311
[tune] better error when metric or mode unset in search algorithms (#11646) 2020-10-28 13:17:59 -07:00
Richard Liaw
58891551d3
[tune] make tests faster + fix flaky test (#10264) 2020-10-28 13:14:54 -07:00
Gekho457
9e63f7ccc3
[autoscaler/k8s] ray up 409 error fix (#11660) 2020-10-28 14:19:57 -05:00
Tao Wang
1d5694ddea
[GCS]Use direct getting instead of pub-sub to update load metrics in monitor.py (#11339) 2020-10-28 11:23:18 -07:00
Eric Liang
c933477915
[new scheduler] Pass test_basic and add CI builds with flag on (#11635) 2020-10-28 11:02:43 -07:00
Stephanie Wang
427b5af0ae
[Object spilling] Refactor raylet to add a local object manager class (#11647)
* Fix pytest...

* Release objects that have been spilled

* GCS object table interface refactor

* Add spilled URL to object location info

* refactor to include spilled URL in notifications

* improve tests

* Add spilled URL to object directory results

* Remove force restore call

* Merge spilled URL and location

* fix

* tmp

* refactor

* unit test skeleton

* unit testing

* unit test fixes

* cleanup

* cleanup

* update

* Separate pinning from waiting for object free, fixes pytest

* Update src/ray/raylet/local_object_manager.h

Co-authored-by: Eric Liang <ekhliang@gmail.com>

Co-authored-by: Tyler Westenbroek <westenbroekt@berkeley.edu>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-10-28 10:38:42 -04:00
Richard Liaw
70ea1fbe30
[sgd] pin ptl to 1.0.3 (#11664)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-28 00:29:01 -07:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter (#11391)
* Optimize dashboard datacenter

* Fix tests

* Fix tests

* Fix

* Fix CI

* python/build-wheel-macos.sh

Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
fangfengbin
55a090fb16
[GCS]Optimize gcs client nodes get function (#11424)
* [GCS]Optimize gcs client nodes get function

* fix review comment

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 21:13:19 -07:00
yncxcw
c3e246818a
[Core] Fix doc string for ray.init() (#11657) 2020-10-27 18:27:22 -07:00
Tao Wang
273a712786
[GCS]Decouple node failure detector with resoure related operations (#11465) 2020-10-27 15:52:42 -07:00
Ameer Haj Ali
1c40950877
[autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique (#11600) 2020-10-27 15:33:11 -07:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes (#11613)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw
293483ed0b
[k8s][minor] fix error handling (#11653)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:24:07 -07:00
Ian Rodney
3ce852d345
[docker] Synchronize Torch for Tune & RLlib (#11637) 2020-10-27 18:37:25 +01:00
fangfengbin
ebe9a8865c
[GCS]Fix a bug that creates invalid connection (#11590)
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-10-27 10:08:06 -07:00
Sven Mika
d9f1874e34
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
Jack Parker-Holder
e7aafd7d24
[tune] PB2 (#11466)
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 01:03:21 -07:00
Edward Oakes
349c3ec86b
Remove errant "self" argument to NodeProvider static method 2020-10-26 22:22:41 -07:00
Simon Mo
fe4a78b7c7
[Hotfix] Pin Pydantic Version (#11622) 2020-10-26 16:52:19 -07:00
Kai Fricke
1a1ff28d18
[tune] allow tune search spaces to be passed to search algorithms (#11503) 2020-10-26 12:33:13 -07:00
Richard Liaw
4ad8af9b0d
[tune] More PTL example cleanup (#11585) 2020-10-26 12:26:14 -07:00
Richard Liaw
b02e61f672
[minor] fix up docs (#11596)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-26 12:19:03 -07:00
Ian Rodney
2da6ad2176
[core] Better error message for named actor not found (#11604) 2020-10-26 09:46:02 -07:00
Tao Wang
0fbee4da0c
[GCS] Remove unused ReportBatchHeartbeat/SubscribeHeartbeat (#11567)
* Remove unused message ReportBatchHeartbeat

* add up
2020-10-25 21:06:28 -07:00
Sumanth Ratna
11f1bbf03c
[tune] use isinstance instead of type for TBXLogger (#11595) 2020-10-25 16:12:44 -07:00