Commit graph

3093 commits

Author SHA1 Message Date
Richard Liaw
b0c0de49a2
[tune] Fixup exception messages (#5238) 2019-07-20 22:36:27 -07:00
Eric Liang
d58b986858
[rllib] MultiCategorical shouldn't return array for kl or entropy (#5215)
* wip

* fix
2019-07-19 12:12:04 -07:00
Jones Wong
da7676c925 Removed the implicit sync barrier at the end of each training iteration (#5217)
*  removed sync barrier at the end of each training iteration

*  formatted

*  modify the comment according to current semantics

*  lint check

* Update trainer.py
2019-07-18 22:59:52 -07:00
Eric Liang
28e5c5555d
[rllib] Move some inline defs to avoid deserialization errors (#5228)
* fix bug

* move metrics too
2019-07-18 21:01:16 -07:00
Zhijun Fu
aa42328874 [direct call] add local plasma provider (#5184) 2019-07-19 11:29:12 +08:00
micafan
b5b8c1d361 [GCS] introduce new gcs client and refactor actor table (#5058) 2019-07-19 11:28:34 +08:00
Jones Wong
0af07bd493 Enable seeding actors for reproducible experiments (#5197)
*  enable graph-level worker-specific seed

*  lint checked

*  revised according to eric's suggestions

*  revised accordingly and added a test case

*  formated

* Update test_reproducibility.py

* Update trainer.py

* Update rollout_worker.py

* Update run_rllib_tests.sh

* Update worker_set.py
2019-07-17 23:31:34 -07:00
Qingqing Mao
63f49f95dd Improve memory check (#5216)
* Improve MemoryMonitor

- Add an env var to control the threshold.
- Use cgroup memory limit and usage for container environment.

* linting

* white space

* add comment
2019-07-17 23:30:02 -07:00
Jones Wong
81d297f87e Remove redundant scaler of l2 reg (#5172)
*  remove redundant scaler of l2 reg

*  lint formatted

* Update ddpg_policy.py
2019-07-17 15:11:27 -07:00
Jones Wong
ae03c42dd6 Fixed inconsistent action placeholder (#5213) 2019-07-17 10:55:14 -07:00
Sam Toyer
214f09d969 [rllib] Make RLLib handle zero-length observation arrays (#5208)
* [rllib] Make _summarize handle zero-len arrays

Fixes #5207

* [rllib] Make aligned_array() handle empty arrays

* [rllib] Conform with old yapf
2019-07-16 22:37:57 -07:00
Richard Liaw
3e0ad11ae0
Add heartbeat test + Fix monitor.py (#5191) 2019-07-16 21:59:48 -07:00
Eric Liang
4fa2a6006c
[rllib] Remove nested import (#5204)
* remove nested import

* Update metrics.py
2019-07-16 10:52:56 -07:00
Eric Liang
047f4ccd61
[rllib] Fix rollout.py with tuple action space (#5201)
* fix it

* update doc too

* fix rollout
2019-07-16 10:52:35 -07:00
Kai Yang
806524384b [Java worker] Refactor object store and worker context on top of core worker (#5079) 2019-07-16 20:58:02 +08:00
Edward Oakes
e5be5fd46d Remove dependencies from TaskExecutionSpecification (#5166) 2019-07-15 18:15:21 -07:00
Simon Mo
fd71ffde2f Improve release process 0.7.2 (#5187) 2019-07-15 14:46:54 -07:00
Hao Chen
ea6aa6409a Reconstruct failed actors without sending tasks. (#5161)
* fast reconstruct dead actors

* add test

* fix typos

* remove debug print

* small fix

* fix typos

* Update test_actor.py
2019-07-15 10:25:09 -07:00
Hao Chen
7342117710
Fix a multithreading bug in grpc ClientCall (#5196) 2019-07-15 14:49:53 +08:00
Jones Wong
5b13a7eb90 Keep parameter space noise consistent with action space noise (Fix 5173) (#5193)
*  make parameter space noise consistent with action space noise

*  modified according to lint check

*  indent
2019-07-14 12:20:35 -07:00
Philipp Moritz
322b5166ad Update arrow to include user defined status for plasma (#5156) 2019-07-12 22:51:14 -07:00
Hao Chen
f5a87b88a3 Fix: ServerCallFactory's destructor not marked as virtual (#5185) 2019-07-13 09:38:47 +08:00
Richard Liaw
b6509f46b0
Update wheels to 0.8.0dev2 (#5186) 2019-07-12 17:27:03 -07:00
Richard Liaw
1530389822
[tune] Fast Node Recovery (#5053) 2019-07-12 13:47:30 -07:00
Hao Chen
0ec3a16bbd
Fix Java MultithreadingTest (#5182) 2019-07-12 19:00:13 +08:00
Stephanie Wang
f46c555e9e Only get actor ID if actor task (#5180) 2019-07-12 14:31:21 +08:00
vipulharsh
3b42d5ccb1 Track newly created actor's parent actor (#5098)
* Track parent actor of actor

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* fixing a comment

* Fixing typo in a comment

* capturing task_spec instead of actor_data

* adding const for some local variables

* changing an if else to else

* Linted version

* use updated method to create task from task_data

Change-Id: I9c1a65134dc23a2d175047e96b86ab9d9cf61971

* fixing linter issues

Change-Id: I1def06218130b399d2527b999258aecf9abb98dd
2019-07-11 14:52:04 -07:00
Kristian Hartikainen
3456afdea7 [autoscaler] Fix missing body argument in GCP getIamPolicy #5169 2019-07-11 13:03:51 -07:00
Philipp Moritz
ccee77aafd fix node_failures.py (#5167) 2019-07-11 11:40:13 -07:00
Zhijun Fu
1649f1370e [direct call] changes raylet to push tasks to worker (#5140)
* refactor grpc server

* format

* change GetTask() to PushTask()

* change PushTask to AssignTask

* format

* add resource_ids

* move done_callback to server call

* remove SetTaskHandler and initialize it in task receiver's constructor

* format

* resolve comments

* update

* update

* Update src/ray/core_worker/core_worker.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* resolve comments

* format

* Update src/ray/core_worker/transport/raylet_transport.cc

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* resolve comments

* resolve comments

* fix build

* format

* fix

* format

* noop
2019-07-11 11:01:32 -07:00
Hao Chen
fd835d107e
Move task to common module and add checks in getter methods (#5147) 2019-07-11 17:07:04 +08:00
Kai Yang
d8b50a5018 Fix GcsClient resource map (#5171) 2019-07-11 16:05:12 +08:00
Qing Wang
f2293243cc
[ID Refactor] Shorten the length of JobID to 4 bytes (#5110)
* WIP

* Fix

* Add jobid test

* Fix

* Add python part

* Fix

* Fix tes

* Remove TODOs

* Fix C++ tests

* Lint

* Fix

* Fix exporting functions in multiple ray.init

* Fix java test

* Fix lint

* Fix linting

* Address comments.

* FIx

* Address and fix linting

* Refine and fix

* Fix

* address

* Address comments.

* Fix linting

* Fix

* Address

* Address comments.

* Address

* Address

* Fix

* Fix

* Fix

* Fix lint

* Fix

* Fix linting

* Address comments.

* Fix linting

* Address comments.

* Fix linting

* address comments.

* Fix
2019-07-11 14:25:16 +08:00
Hao Chen
88365d4112
Fix Java MultithreadingTest (#5170) 2019-07-11 13:40:40 +08:00
Kai Yang
43b6513d19 [GCS] Move node resource info from client table to resource table (#5050) 2019-07-11 13:17:19 +08:00
Richard Liaw
691c9733f9
[tune] Document trainable attributes and enable user-checkpoint… (#4868) 2019-07-10 18:51:11 -07:00
Philipp Moritz
e6a81d40a5 [stability] Make task result for RemoveTask optional (#5146)
* make task result for RemoveTask optional

* lint

* update

* update

* update

* rename

* lint
2019-07-10 13:33:41 -07:00
Hao Chen
0c34749779
Use bazel disk cache for all CI jobs (#5144) 2019-07-10 22:03:45 +08:00
Richard Liaw
0b540ab492
[tune] Test example checkpointing (#4728) 2019-07-10 01:58:26 -07:00
Joey Jiang
e55c8ca165 Fix crash because of the reference to deleted variable in grpc server call (#5158) 2019-07-10 14:06:21 +08:00
Edward Oakes
2b7b7c7547 Add linting pre-push hook (#5154) 2019-07-09 21:49:12 -07:00
Eric Liang
5ab5017c67
[rllib] Fix impala stress test (#5101)
* add copy

* upgrade to tf 1.14

* update

* reduce count to workaround https://github.com/ray-project/ray/issues/5125

* Update impala.py

* placeholder

* comments

* update
2019-07-09 20:22:30 -07:00
Joey Jiang
5733690aa6 Add success and fail callback of grpc sending reply (#5141) 2019-07-09 17:03:57 +08:00
Eric Liang
5aec750107
Add warning/error if object store memory exceeds available memory (#4893)
* exclude

* format

* add warning

* hatch

* reduce mem usage

* reduce object store mem

* set obj mem
2019-07-08 21:37:08 -07:00
Stefan Pantic
dfc94ce7bc [rllib]Add entropy coeff decay (#5043) 2019-07-08 18:30:32 -07:00
Daniel Edgecumbe
eeb67db861 [autoscaler] Log AWS NodeProvider create_instances (#4998)
* autoscaler: Log on AWS NodeProvider create_instances

* logging
2019-07-08 13:22:26 -07:00
Hao Chen
8a30b93e42
Define common data structures with protobuf. (#5121) 2019-07-08 22:41:37 +08:00
Joey Jiang
b4e51c8aa1 Support clang-format whose version is not 7.0 (#5139) 2019-07-08 17:15:09 +08:00
Sam Toyer
7ad854d4c6 [tune] Use traceback.format_tb() (fixes #5135) (#5136) 2019-07-08 01:13:06 -07:00
Joey Jiang
274233962f Remove unused connection file in object manager (#5123) 2019-07-08 10:59:36 +08:00