Eric Liang
690b374581
[rllib] Add Keras LSTM example with ModelV2 ( #5258 )
2019-07-24 13:09:41 -07:00
Eric Liang
5b76238bce
Fix two types of eviction hangs ( #5225 )
2019-07-23 21:20:17 -07:00
Eric Liang
97c43284a6
[rllib] Fix trainer state restore ( #5257 )
2019-07-23 21:18:58 -07:00
Stephanie Wang
9c651f47bb
Add regression test for actor load balancing ( #5224 )
...
* Add regression test for actor load balancing
* Increase timeout
* Reduce number of nodes?
2019-07-23 15:11:55 -07:00
Stephanie Wang
15959b0f0d
Leave ray.wait
calls open until the task or actor exits ( #5234 )
...
* Regression test
* Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies
- Some initial implementation
* unit test
* Improve unit tests for TaskDependencyManager
* Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing
* Add ray.wait python test for drivers that exit early
* Add WorkerID to Worker
* Update test to use two nodes
* Regression test for ray.wait passes
* Extend regression test to include ray.wait from an actor
* Fix ClientID and WorkerIDs
* lint
* lint
* Remove unnecessary ray_get argument
* fix build
2019-07-23 11:55:28 -07:00
Qing Wang
a3d4f9f16d
Fix the issue when passing multiple options in one string ( #5241 )
...
* Fix
* Fix linting
* Fix linting
* Address
* Fix test
2019-07-23 12:28:54 +08:00
Peter Schafhalter
fc589050c9
[sgd] Deprecate old distributed SGD implementation ( #5160 )
...
* Deprecate old distributed SGD implementation
* Update README
2019-07-22 15:47:10 -07:00
Vince Jankovics
80b976efcb
Ray namespace added for k8s ( #4111 )
...
* Ray namespace added for k8s
* Submit.yaml update with k8s namespace
* K8s deployment doc update with namespace
2019-07-22 15:45:05 -07:00
Richard Liaw
7fc15dbf7f
[autoscaler] Clean up error messages on setup failure ( #5210 )
2019-07-22 11:27:51 -07:00
Richard Liaw
53fb876a5f
Improved KeyboardInterrupt Exception Handling ( #5237 )
2019-07-22 02:29:56 -07:00
Eric Liang
f9043cc49a
[rllib] Remove experimental eager support
2019-07-21 12:27:17 -07:00
Richard Liaw
b0c0de49a2
[tune] Fixup exception messages ( #5238 )
2019-07-20 22:36:27 -07:00
Eric Liang
d58b986858
[rllib] MultiCategorical shouldn't return array for kl or entropy ( #5215 )
...
* wip
* fix
2019-07-19 12:12:04 -07:00
Jones Wong
da7676c925
Removed the implicit sync barrier at the end of each training iteration ( #5217 )
...
* removed sync barrier at the end of each training iteration
* formatted
* modify the comment according to current semantics
* lint check
* Update trainer.py
2019-07-18 22:59:52 -07:00
Eric Liang
28e5c5555d
[rllib] Move some inline defs to avoid deserialization errors ( #5228 )
...
* fix bug
* move metrics too
2019-07-18 21:01:16 -07:00
Zhijun Fu
aa42328874
[direct call] add local plasma provider ( #5184 )
2019-07-19 11:29:12 +08:00
micafan
b5b8c1d361
[GCS] introduce new gcs client and refactor actor table ( #5058 )
2019-07-19 11:28:34 +08:00
Jones Wong
0af07bd493
Enable seeding actors for reproducible experiments ( #5197 )
...
* enable graph-level worker-specific seed
* lint checked
* revised according to eric's suggestions
* revised accordingly and added a test case
* formated
* Update test_reproducibility.py
* Update trainer.py
* Update rollout_worker.py
* Update run_rllib_tests.sh
* Update worker_set.py
2019-07-17 23:31:34 -07:00
Qingqing Mao
63f49f95dd
Improve memory check ( #5216 )
...
* Improve MemoryMonitor
- Add an env var to control the threshold.
- Use cgroup memory limit and usage for container environment.
* linting
* white space
* add comment
2019-07-17 23:30:02 -07:00
Jones Wong
81d297f87e
Remove redundant scaler of l2 reg ( #5172 )
...
* remove redundant scaler of l2 reg
* lint formatted
* Update ddpg_policy.py
2019-07-17 15:11:27 -07:00
Jones Wong
ae03c42dd6
Fixed inconsistent action placeholder ( #5213 )
2019-07-17 10:55:14 -07:00
Sam Toyer
214f09d969
[rllib] Make RLLib handle zero-length observation arrays ( #5208 )
...
* [rllib] Make _summarize handle zero-len arrays
Fixes #5207
* [rllib] Make aligned_array() handle empty arrays
* [rllib] Conform with old yapf
2019-07-16 22:37:57 -07:00
Richard Liaw
3e0ad11ae0
Add heartbeat test + Fix monitor.py ( #5191 )
2019-07-16 21:59:48 -07:00
Eric Liang
4fa2a6006c
[rllib] Remove nested import ( #5204 )
...
* remove nested import
* Update metrics.py
2019-07-16 10:52:56 -07:00
Eric Liang
047f4ccd61
[rllib] Fix rollout.py with tuple action space ( #5201 )
...
* fix it
* update doc too
* fix rollout
2019-07-16 10:52:35 -07:00
Kai Yang
806524384b
[Java worker] Refactor object store and worker context on top of core worker ( #5079 )
2019-07-16 20:58:02 +08:00
Edward Oakes
e5be5fd46d
Remove dependencies from TaskExecutionSpecification ( #5166 )
2019-07-15 18:15:21 -07:00
Simon Mo
fd71ffde2f
Improve release process 0.7.2 ( #5187 )
2019-07-15 14:46:54 -07:00
Hao Chen
ea6aa6409a
Reconstruct failed actors without sending tasks. ( #5161 )
...
* fast reconstruct dead actors
* add test
* fix typos
* remove debug print
* small fix
* fix typos
* Update test_actor.py
2019-07-15 10:25:09 -07:00
Hao Chen
7342117710
Fix a multithreading bug in grpc ClientCall
( #5196 )
2019-07-15 14:49:53 +08:00
Jones Wong
5b13a7eb90
Keep parameter space noise consistent with action space noise (Fix 5173) ( #5193 )
...
* make parameter space noise consistent with action space noise
* modified according to lint check
* indent
2019-07-14 12:20:35 -07:00
Philipp Moritz
322b5166ad
Update arrow to include user defined status for plasma ( #5156 )
2019-07-12 22:51:14 -07:00
Hao Chen
f5a87b88a3
Fix: ServerCallFactory's destructor not marked as virtual ( #5185 )
2019-07-13 09:38:47 +08:00
Richard Liaw
b6509f46b0
Update wheels to 0.8.0dev2 ( #5186 )
2019-07-12 17:27:03 -07:00
Richard Liaw
1530389822
[tune] Fast Node Recovery ( #5053 )
2019-07-12 13:47:30 -07:00
Hao Chen
0ec3a16bbd
Fix Java MultithreadingTest ( #5182 )
2019-07-12 19:00:13 +08:00
Stephanie Wang
f46c555e9e
Only get actor ID if actor task ( #5180 )
2019-07-12 14:31:21 +08:00
vipulharsh
3b42d5ccb1
Track newly created actor's parent actor ( #5098 )
...
* Track parent actor of actor
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* Update src/ray/raylet/node_manager.cc
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* fixing a comment
* Fixing typo in a comment
* capturing task_spec instead of actor_data
* adding const for some local variables
* changing an if else to else
* Linted version
* use updated method to create task from task_data
Change-Id: I9c1a65134dc23a2d175047e96b86ab9d9cf61971
* fixing linter issues
Change-Id: I1def06218130b399d2527b999258aecf9abb98dd
2019-07-11 14:52:04 -07:00
Kristian Hartikainen
3456afdea7
[autoscaler] Fix missing body argument in GCP getIamPolicy
#5169
2019-07-11 13:03:51 -07:00
Philipp Moritz
ccee77aafd
fix node_failures.py ( #5167 )
2019-07-11 11:40:13 -07:00
Zhijun Fu
1649f1370e
[direct call] changes raylet to push tasks to worker ( #5140 )
...
* refactor grpc server
* format
* change GetTask() to PushTask()
* change PushTask to AssignTask
* format
* add resource_ids
* move done_callback to server call
* remove SetTaskHandler and initialize it in task receiver's constructor
* format
* resolve comments
* update
* update
* Update src/ray/core_worker/core_worker.cc
Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>
* resolve comments
* format
* Update src/ray/core_worker/transport/raylet_transport.cc
Co-Authored-By: Hao Chen <chenh1024@gmail.com>
* resolve comments
* resolve comments
* fix build
* format
* fix
* format
* noop
2019-07-11 11:01:32 -07:00
Hao Chen
fd835d107e
Move task to common module and add checks in getter methods ( #5147 )
2019-07-11 17:07:04 +08:00
Kai Yang
d8b50a5018
Fix GcsClient resource map ( #5171 )
2019-07-11 16:05:12 +08:00
Qing Wang
f2293243cc
[ID Refactor] Shorten the length of JobID to 4 bytes ( #5110 )
...
* WIP
* Fix
* Add jobid test
* Fix
* Add python part
* Fix
* Fix tes
* Remove TODOs
* Fix C++ tests
* Lint
* Fix
* Fix exporting functions in multiple ray.init
* Fix java test
* Fix lint
* Fix linting
* Address comments.
* FIx
* Address and fix linting
* Refine and fix
* Fix
* address
* Address comments.
* Fix linting
* Fix
* Address
* Address comments.
* Address
* Address
* Fix
* Fix
* Fix
* Fix lint
* Fix
* Fix linting
* Address comments.
* Fix linting
* Address comments.
* Fix linting
* address comments.
* Fix
2019-07-11 14:25:16 +08:00
Hao Chen
88365d4112
Fix Java MultithreadingTest ( #5170 )
2019-07-11 13:40:40 +08:00
Kai Yang
43b6513d19
[GCS] Move node resource info from client table to resource table ( #5050 )
2019-07-11 13:17:19 +08:00
Richard Liaw
691c9733f9
[tune] Document trainable attributes and enable user-checkpoint… ( #4868 )
2019-07-10 18:51:11 -07:00
Philipp Moritz
e6a81d40a5
[stability] Make task result for RemoveTask optional ( #5146 )
...
* make task result for RemoveTask optional
* lint
* update
* update
* update
* rename
* lint
2019-07-10 13:33:41 -07:00
Hao Chen
0c34749779
Use bazel disk cache for all CI jobs ( #5144 )
2019-07-10 22:03:45 +08:00
Richard Liaw
0b540ab492
[tune] Test example checkpointing ( #4728 )
2019-07-10 01:58:26 -07:00