Ameer Haj Ali
08e0e8311a
[autoscaler] Fixing AWS instance types autofill ( #11758 )
2020-11-03 09:34:14 -08:00
Kai Fricke
f7b19c41e3
[tune] logger refactor part 1: move classes and utilities to own files ( #11746 )
...
* [tune] logger refactor part 1: move classes and utilities to own files
* Fix circular dependency
* Remove uneeded pretty print copy
* Apply suggestions from code review
2020-11-03 07:48:09 -08:00
Maksim Smolin
0a6d24a727
[cli] Remove the deprecated old_style
logging calls ( #10776 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-11-02 23:40:18 -08:00
Stephanie Wang
0ba777af99
[Object spilling] Add policy to automatically spill objects on OutOfMemory ( #11673 )
2020-11-02 12:42:02 -08:00
Ameer Haj Ali
8d74a04a42
[autoscaler] Flag flip for resource_demand_scheduler should take into account queue ( #11615 )
2020-11-02 12:41:22 -08:00
Ian Rodney
171e02c684
[serve] re-enable serve-controller-crash test ( #11579 )
2020-11-02 11:22:09 -08:00
Eric Liang
48dee789b3
Add random actor placement; fix cancellation callback; update test skips ( #11684 )
2020-10-30 18:36:35 -07:00
DK.Pino
b10871a1f5
[Core]Fix get workrer table bug ( #11516 )
...
* fix get_worker_table bug
* fix lint
* fix comment
* remove actor table
* fix comment
* fix get alive worker
* remove unused python import
2020-10-30 14:48:29 -07:00
SangBin Cho
71c5089854
[Object Spilling] Initial Iteration of S3 adapter. ( #11379 )
...
* Finished the first iteration.
* Removed unnecessary code.
* Smartopen impl.
* Make sure tests passed.
* Addressed code review.
* Addressed code review.
* Fix issues.
* Fix issues.
2020-10-30 14:47:07 -07:00
Ameer Haj Ali
7aade469d0
[autoscaler] fix the autoscaling bug for continuously launching failed nodes ( #11714 )
2020-10-30 14:12:06 -07:00
Gekho457
8816d34541
Kubernetes rsync verbosity fixed ( #11716 )
2020-10-30 14:03:42 -07:00
Alan Guo
3c109b45aa
Disable validation of cluster config on the cluster to allow for cluster configs with new properties. ( #11693 )
2020-10-30 14:02:00 -07:00
Eric Liang
f9f372c327
[autoscaler] Clean up monitoring loop code ( #11677 )
2020-10-30 13:48:43 -07:00
SangBin Cho
6e2a1eac36
[Placement Group] Placement group automatic cleanup. ( #11546 )
...
* In progress. Done with all placement group manager code.
* It is working with job.
* Finished detached actor implementation.
* Fix minor issue.
* In progress.
* Addressed code review.
* Addressed code review.
* Addressed code reivew.
* Fix a build error.
2020-10-30 10:55:43 -07:00
architkulkarni
4175569d96
[Core] Add option to override environment variables for tasks and actors ( #11619 )
2020-10-29 14:22:44 -05:00
Simon Mo
e82ff08b0c
Fix asyncio plasma integration in cluster mode ( #11665 )
2020-10-29 11:53:10 -07:00
Simon Mo
46afec5660
Mute asyncio warning for Serve ( #11682 )
2020-10-28 17:05:42 -07:00
Kai Fricke
ba63ded311
[tune] better error when metric
or mode
unset in search algorithms ( #11646 )
2020-10-28 13:17:59 -07:00
Richard Liaw
58891551d3
[tune] make tests faster + fix flaky test ( #10264 )
2020-10-28 13:14:54 -07:00
Gekho457
9e63f7ccc3
[autoscaler/k8s] ray up 409 error fix ( #11660 )
2020-10-28 14:19:57 -05:00
Tao Wang
1d5694ddea
[GCS]Use direct getting instead of pub-sub to update load metrics in monitor.py ( #11339 )
2020-10-28 11:23:18 -07:00
Eric Liang
c933477915
[new scheduler] Pass test_basic and add CI builds with flag on ( #11635 )
2020-10-28 11:02:43 -07:00
Richard Liaw
70ea1fbe30
[sgd] pin ptl to 1.0.3 ( #11664 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-28 00:29:01 -07:00
fyrestone
05ad4c7499
[Dashboard] Optimize dashboard datacenter ( #11391 )
...
* Optimize dashboard datacenter
* Fix tests
* Fix tests
* Fix
* Fix CI
* python/build-wheel-macos.sh
Co-authored-by: 刘宝 <po.lb@antfin.com>
Co-authored-by: Max Fitton <maxfitton@anyscale.com>
2020-10-27 23:49:31 -07:00
yncxcw
c3e246818a
[Core] Fix doc string for ray.init() ( #11657 )
2020-10-27 18:27:22 -07:00
Ameer Haj Ali
1c40950877
[autoscaler] Add the cluster_name to docker file mounts directory prefix to make it more unique ( #11600 )
2020-10-27 15:33:11 -07:00
Scott Graham
c4ae94d60b
[autoscaler] Azure deployment fixes ( #11613 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:27:18 -07:00
Richard Liaw
293483ed0b
[k8s][minor] fix error handling ( #11653 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 15:24:07 -07:00
Ian Rodney
3ce852d345
[docker] Synchronize Torch for Tune & RLlib ( #11637 )
2020-10-27 18:37:25 +01:00
Sven Mika
d9f1874e34
[RLlib] Minor fixes (torch GPU bugs + some cleanup). ( #11609 )
2020-10-27 10:00:24 +01:00
Jack Parker-Holder
e7aafd7d24
[tune] PB2 ( #11466 )
...
Co-authored-by: Sumanth Ratna <sumanthratna@gmail.com>
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-10-27 01:03:21 -07:00
Edward Oakes
349c3ec86b
Remove errant "self" argument to NodeProvider static method
2020-10-26 22:22:41 -07:00
Simon Mo
fe4a78b7c7
[Hotfix] Pin Pydantic Version ( #11622 )
2020-10-26 16:52:19 -07:00
Kai Fricke
1a1ff28d18
[tune] allow tune search spaces to be passed to search algorithms ( #11503 )
2020-10-26 12:33:13 -07:00
Richard Liaw
4ad8af9b0d
[tune] More PTL example cleanup ( #11585 )
2020-10-26 12:26:14 -07:00
Sumanth Ratna
11f1bbf03c
[tune] use isinstance instead of type for TBXLogger ( #11595 )
2020-10-25 16:12:44 -07:00
Richard Liaw
1b357533b1
[tune] Try to enable PTL, SKlearn tests ( #11542 )
2020-10-24 01:08:46 -07:00
Siyuan (Ryans) Zhuang
5ad5cb61ca
Remove outdated numpy serializer ( #11587 )
2020-10-23 22:58:05 -07:00
Raoul Khouri
c3c72db69b
[tune] fixed validation for search metrics ( #11583 )
...
* fixed validation for search metrics
* formatting
* made error report better
* if only one metric is missing extract it from list
* any can take a generator
2020-10-23 17:04:21 -07:00
Clark Zinzow
0979589c7c
[dask-on-ray] Convert tuple of object refs to list before ray.get() call. ( #11582 )
2020-10-23 16:39:22 -07:00
Ian Rodney
d3405e74da
[autoscaler] SDK fixes ( #11517 )
...
* [autoscaler] SDK Fxies
* add docs
* remove all_nodes
2020-10-23 14:09:47 -07:00
Ian Rodney
aef96d17bf
[yaml] HotFix for correct example full ( #11584 )
2020-10-23 15:55:07 -05:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 ( #11510 )
2020-10-23 15:52:14 -05:00
Kai Fricke
8ee4f7eca3
[tune] fix pbt ptl example ( #11573 )
...
* [tune] fix pbt ptl example
* wider smoke test
2020-10-23 12:42:13 -07:00
architkulkarni
1ce0c4965b
[Serve] Update front page of serve doc ( #11421 )
2020-10-23 12:01:04 -07:00
DK.Pino
9f804ade5f
[Placement Group]Add get all placement group api ( #11460 )
...
* add get all interface for placement group
* add get all interface for placement group
* make it work
* fix lint
* fix lint
* fix comment
* add cpp test
* fix python lint
2020-10-23 11:46:48 -07:00
Richard Liaw
e7aa6441b7
[tune] a tiny ptl example ( #11497 )
2020-10-22 18:50:34 -07:00
Gekho457
2d1f52c21c
[autoscaler] Removed .cleanup() from NodeProvider and commands.py ( #11543 )
2020-10-22 14:46:49 -07:00
Frank Gu
73fa94731f
[tune] Add HDFS as Cloud Sync Client ( #11524 )
2020-10-22 14:12:51 -07:00
Eric Liang
083737c63c
Deprecate rsync to all nodes ( #11563 )
2020-10-22 13:45:42 -07:00