Commit graph

5467 commits

Author SHA1 Message Date
krfricke
8f0f7371a0
[tune] Added Kubernetes syncer and sync client (#10097)
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-16 14:09:28 -07:00
Julius Frost
dc659ae89a
make action probabilities a numpy array (#10122) 2020-08-16 11:25:12 -07:00
Philipp Moritz
c7adb464e4
[autoscaler] Fix run_env='host' for initialization commands (#10137) 2020-08-15 15:25:54 -07:00
Olli Huotari
9ff599cbb8
torch policy now includes model.metrics (#10121)
* torch policy now includes model.metrics

* Fixed tests to work with custom metrics

* Forgot to run format.sh
2020-08-15 10:43:11 -07:00
Sven Mika
aeb5be7733
[RLlib] Trajectory View API (part 2.5): Actual implementations (not used yet) of a SampleCollector. (#10112) 2020-08-15 15:09:00 +02:00
Sven Mika
2256047876
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114) 2020-08-15 13:24:22 +02:00
Chua Cheow Huan
ea51e94729
[rllib] Learning rate schedule for DDPPO. (#10006)
* Get shared metrics, increment counter & set global vars for remote workers.

* Add unit test to test lr_schedule for DDPPO.

* Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.
2020-08-15 00:51:45 -07:00
Olli Huotari
ed6d1d7a7c
[Misc] Include info about flake8_quotes in format.sh (#10123) 2020-08-15 00:09:02 -07:00
Philipp Moritz
e95f0afe4c
[autoscaler] Expand key path for hashing with expanduser (#10125) 2020-08-14 18:50:27 -07:00
Amog Kamsetty
f87a4aa45d
[Tune] Pbt Function API (#9958)
* adding function convnet example

* add unit test

* update test

* update example

* wip

* move error from experiment to tune

* wip

* Fix checkpoint deletion

* updating code

* adding smoke test

* updating pbt guide

* formatting

* fix build

* add best checkpoint analysis util

* update test

* add comments

* remove class api

* fix example

* add setup and teardown to tests

* formatting

* Update python/ray/tune/tests/test_trial_scheduler_pbt.py

Co-authored-by: Kai Fricke <kai@anyscale.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-14 17:52:30 -07:00
Tao Wang
fba5906ce3
[GCS] Re-report heartbeat when gcs server restarts (#10040)
* Retry to send failed heartbeat when light heartbeat enalbed

* Re-report heartbeat when gcs server restarts

* remove is_pubsub_server_restarted

* add lock per comment

* minor change, name related
2020-08-14 17:37:20 -07:00
Siyuan (Ryans) Zhuang
17ca1d8ff4
[Core] Object spilling prototype (#9818) 2020-08-14 15:39:10 -07:00
Robert Nishihara
36e626e95d
Revert "[Dashboard] Start the new dashboard (#9860)" (#10116)
This reverts commit 739933e5b8.
2020-08-14 14:06:57 -07:00
chaokunyang
7ffb37f711
[Java] add maven repo (#10109) 2020-08-14 11:31:01 -07:00
Simon Mo
d0c2e90577
[Build] Make sure local format.sh check protobuf (#10118)
* Ignore protobuf files for clangformat check

* Revert "Ignore protobuf files for clangformat check"

This reverts commit ccd84d4e1517220eb4e946918174150ce2265467.

* Make sure protobuf is checked locally
2020-08-14 11:22:55 -07:00
Philipp Moritz
6b53df9599
Hash contents of SSH key instead of key path (#10103) 2020-08-14 00:10:31 -07:00
fangfengbin
3a6fa7d622
[Placement Group]Optimize placement group strict pack strategy (#9924)
* add part code

* add code

* add part code

* rm used import

* add part code

* add part code

* add part code

* add part code

* add part code

* add part code

* fix review comment

* add testcase

* use ResourceSet

* fix review comment

* fix ut bug

Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-08-13 23:58:52 -07:00
Lixin Wei
0fe5722744
[Core] Add cached memory to unsued memory in Linux/BSD (#10084)
* add cached memory to available memory

* format

* bug fixed

* bug fixed

* fixed

* lint
2020-08-13 23:47:52 -07:00
Ian Rodney
a252aa29da
[docker] Wrap more internal items with run_env="host" (#10078) 2020-08-13 20:35:06 -07:00
SangBin Cho
16f6ee4914
Small Update (#10107) 2020-08-13 19:39:55 -07:00
SangBin Cho
55fe7f65a5
[Tests] Make test_output debugging easy (#10091)
* Fix.

* Fix.
2020-08-13 18:45:26 -07:00
Eric Liang
c9f13b0833
[Placement Groups] Support CUDA_VISIBLE_DEVICES (#10053) 2020-08-13 18:00:04 -07:00
Simon Mo
01f38bc5d1
CoreWorker correctly push metrics to agent (#10031) 2020-08-13 16:44:53 -07:00
Ícaro Aragão
b77d6bf87d
[GCS] Improve fallback for getting local valid IP for GCS server (#10004) 2020-08-13 16:29:47 -05:00
Amog Kamsetty
5898248645
[Tune] Update PBT Transformer Test (#10081) 2020-08-13 12:23:03 -07:00
Tanay Wakhare
1826b29757
[RLlib] Curiosity (intrinsic motivation) Exploration module. (#9912) 2020-08-13 20:14:16 +02:00
SangBin Cho
8b689224a5
[Tests] Make test_multi_driver light. (#10086) 2020-08-13 10:00:42 -07:00
architkulkarni
fe5fcb6b9c
[serve] backend and endpoint validation (#9954) 2020-08-13 11:56:50 -05:00
Sven Mika
66d204e078
[RLlib] Model documentation enhancements. (#10011) 2020-08-13 13:36:40 +02:00
Sven Mika
0effcda3e4
Add missing int-casts for all shape calculating code (using np.product([some shape])). (#10092) 2020-08-13 12:04:22 +02:00
Julius Frost
6d9d2b320a
[RLlib] Support windows drives other than C drive for the offline json API (#9909) 2020-08-13 11:57:54 +02:00
Richard Liaw
7a56c3b71a
[cli] create_or_update_cluster fix (#10085) 2020-08-13 00:54:45 -07:00
SangBin Cho
86b1db3f11
[Stats] Make metrics report time configurable (#10036)
* Done.

* Lint.

* Address code review.

* Address code review.

* Remove wrong commit.

* Fix a test error.
2020-08-13 00:30:24 -07:00
fyrestone
739933e5b8
[Dashboard] Start the new dashboard (#9860) 2020-08-13 11:01:46 +08:00
krfricke
16486a8df3
[tune] Add OptunaSearcher wrapper around Optuna samplers (#10044)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-12 16:13:22 -07:00
Richard Liaw
7a8b922841
[tune] hotfix log_once (#10069) 2020-08-12 12:40:22 -07:00
Simon Mo
bb9ef511da
Use the right bucket for travis logs (#10073) 2020-08-12 11:46:49 -07:00
SangBin Cho
2cb79632e4
Revert "[Core] Add cached memory to available memory (#10020)" (#10064)
This reverts commit 71d2bde458.
2020-08-12 11:24:16 -05:00
Eric Liang
7e3e4cd321
[rllib] Execution plan API documentation (#10000)
* wip

* updte

* comments
2020-08-11 23:58:41 -07:00
Richard Liaw
5560272556
[cli] install nightly wheels via ray install-nightly (#10054) 2020-08-11 20:08:22 -07:00
fangfengbin
701e26e0af
[GCS]Add node realtime resource view (#10043) 2020-08-12 10:52:17 +08:00
Simon Mo
f1ede1099f
[Hotfix] Pin opencv-python-headless==4.3.0.36 (#10049) 2020-08-11 15:58:18 -07:00
Ameer Haj Ali
82cdcff898
Removing kwargs & SSHOptions args from command runners (#10014) 2020-08-11 15:09:49 -07:00
Lixin Wei
71d2bde458
[Core] Add cached memory to available memory (#10020)
* add cached memory to available memory

* format

* bug fixed
2020-08-11 15:07:00 -07:00
Zhuohan Li
a6fed4820e
[Core] Preliminary implementation of ownership-based object directory (#9735) 2020-08-11 15:04:13 -07:00
krfricke
221fdc0774
[tune] fix flaky PBT replay test (#10047)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2020-08-11 14:17:31 -07:00
SangBin Cho
946ae74817
[GCS Actor Management] Race condition around creating -> created phase. (#10035)
* Fix the issue.

* Address a code review.
2020-08-11 12:31:27 -07:00
yncxcw
32cd94b750
[Core] Do not convert gpu id to int (#9744)
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-11 12:09:46 -07:00
Maksim Smolin
d6226b80bb
[cli] CliLogger typing (#10027) 2020-08-11 12:00:57 -07:00
Simon Mo
5b4a10368f
Upload Travis in after_script phase (#10046)
The deploy phase is skipped when script/tests fails. This prevent us
from uploading failed result to S3.

This PR change it to after_script phase and the secret is injected
via Travis Env Var.

https://docs.travis-ci.com/user/job-lifecycle/
2020-08-11 11:46:00 -07:00