1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-18 17:16:39 -04:00
Commit graph

10614 commits

Author SHA1 Message Date
Richard Liaw
7e715520e5
[sgd] Example for Training () 2019-07-27 01:10:25 -07:00
Daniel Edgecumbe
06fec63c87 [autoscaler] Add a 'request_cores' function for manual autoscaling () 2019-07-26 17:14:45 -07:00
lanlin
d9e81da3b8 [tune] configurable maximum length of trial identifier () 2019-07-26 17:09:54 -07:00
Hao Chen
6f737e6a50
Add CODEOWNERS file () 2019-07-26 12:40:07 +08:00
Antoine Galataud
827618254a [rllib] Configure learner queue timeout ()
* configure learner queue timeout

* lint

* use config

* fix method args order, add unit test

* fix wrong param name
2019-07-25 21:18:05 -07:00
micafan
6f682db99d avoid copying ActorTableData when NodeMananger updates an actor to GCS () 2019-07-26 11:17:24 +08:00
Stephanie Wang
3321555975
Increase timeout for ray.wait test ()
* Increase test timeout for ray.wait

* make sure the actor is scheduled
2019-07-25 14:23:46 -07:00
Eric Liang
bf9199ad77
[rllib] ModelV2 support for pytorch () 2019-07-25 11:02:53 -07:00
Joey Jiang
40395acadf [gRPC] Migrate raylet client implementation to grpc () 2019-07-25 14:48:56 +08:00
Eric Liang
60f59639c1
[rllib] Port DDPG to the build_tf_policy pattern () 2019-07-24 13:55:55 -07:00
Eric Liang
690b374581
[rllib] Add Keras LSTM example with ModelV2 () 2019-07-24 13:09:41 -07:00
Eric Liang
5b76238bce
Fix two types of eviction hangs () 2019-07-23 21:20:17 -07:00
Eric Liang
97c43284a6
[rllib] Fix trainer state restore () 2019-07-23 21:18:58 -07:00
Stephanie Wang
9c651f47bb
Add regression test for actor load balancing ()
* Add regression test for actor load balancing

* Increase timeout

* Reduce number of nodes?
2019-07-23 15:11:55 -07:00
Stephanie Wang
15959b0f0d
Leave ray.wait calls open until the task or actor exits ()
* Regression test

* Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies
- Some initial implementation

* unit test

* Improve unit tests for TaskDependencyManager

* Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing

* Add ray.wait python test for drivers that exit early

* Add WorkerID to Worker

* Update test to use two nodes

* Regression test for ray.wait passes

* Extend regression test to include ray.wait from an actor

* Fix ClientID and WorkerIDs

* lint

* lint

* Remove unnecessary ray_get argument

* fix build
2019-07-23 11:55:28 -07:00
Qing Wang
a3d4f9f16d
Fix the issue when passing multiple options in one string ()
* Fix

* Fix linting

* Fix linting

* Address

* Fix test
2019-07-23 12:28:54 +08:00
Peter Schafhalter
fc589050c9 [sgd] Deprecate old distributed SGD implementation ()
* Deprecate old distributed SGD implementation

* Update README
2019-07-22 15:47:10 -07:00
Vince Jankovics
80b976efcb Ray namespace added for k8s ()
* Ray namespace added for k8s

* Submit.yaml update with k8s namespace

* K8s deployment doc update with namespace
2019-07-22 15:45:05 -07:00
Richard Liaw
7fc15dbf7f
[autoscaler] Clean up error messages on setup failure () 2019-07-22 11:27:51 -07:00
Richard Liaw
53fb876a5f
Improved KeyboardInterrupt Exception Handling () 2019-07-22 02:29:56 -07:00
Eric Liang
f9043cc49a
[rllib] Remove experimental eager support 2019-07-21 12:27:17 -07:00
Richard Liaw
b0c0de49a2
[tune] Fixup exception messages () 2019-07-20 22:36:27 -07:00
Eric Liang
d58b986858
[rllib] MultiCategorical shouldn't return array for kl or entropy ()
* wip

* fix
2019-07-19 12:12:04 -07:00
Jones Wong
da7676c925 Removed the implicit sync barrier at the end of each training iteration ()
*  removed sync barrier at the end of each training iteration

*  formatted

*  modify the comment according to current semantics

*  lint check

* Update trainer.py
2019-07-18 22:59:52 -07:00
Eric Liang
28e5c5555d
[rllib] Move some inline defs to avoid deserialization errors ()
* fix bug

* move metrics too
2019-07-18 21:01:16 -07:00
Zhijun Fu
aa42328874 [direct call] add local plasma provider () 2019-07-19 11:29:12 +08:00
micafan
b5b8c1d361 [GCS] introduce new gcs client and refactor actor table () 2019-07-19 11:28:34 +08:00
Jones Wong
0af07bd493 Enable seeding actors for reproducible experiments ()
*  enable graph-level worker-specific seed

*  lint checked

*  revised according to eric's suggestions

*  revised accordingly and added a test case

*  formated

* Update test_reproducibility.py

* Update trainer.py

* Update rollout_worker.py

* Update run_rllib_tests.sh

* Update worker_set.py
2019-07-17 23:31:34 -07:00
Qingqing Mao
63f49f95dd Improve memory check ()
* Improve MemoryMonitor

- Add an env var to control the threshold.
- Use cgroup memory limit and usage for container environment.

* linting

* white space

* add comment
2019-07-17 23:30:02 -07:00
Jones Wong
81d297f87e Remove redundant scaler of l2 reg ()
*  remove redundant scaler of l2 reg

*  lint formatted

* Update ddpg_policy.py
2019-07-17 15:11:27 -07:00
Jones Wong
ae03c42dd6 Fixed inconsistent action placeholder () 2019-07-17 10:55:14 -07:00
Sam Toyer
214f09d969 [rllib] Make RLLib handle zero-length observation arrays ()
* [rllib] Make _summarize handle zero-len arrays

Fixes 

* [rllib] Make aligned_array() handle empty arrays

* [rllib] Conform with old yapf
2019-07-16 22:37:57 -07:00
Richard Liaw
3e0ad11ae0
Add heartbeat test + Fix monitor.py () 2019-07-16 21:59:48 -07:00
Eric Liang
4fa2a6006c
[rllib] Remove nested import ()
* remove nested import

* Update metrics.py
2019-07-16 10:52:56 -07:00
Eric Liang
047f4ccd61
[rllib] Fix rollout.py with tuple action space ()
* fix it

* update doc too

* fix rollout
2019-07-16 10:52:35 -07:00
Kai Yang
806524384b [Java worker] Refactor object store and worker context on top of core worker () 2019-07-16 20:58:02 +08:00
Edward Oakes
e5be5fd46d Remove dependencies from TaskExecutionSpecification () 2019-07-15 18:15:21 -07:00
Simon Mo
fd71ffde2f Improve release process 0.7.2 () 2019-07-15 14:46:54 -07:00
Hao Chen
ea6aa6409a Reconstruct failed actors without sending tasks. ()
* fast reconstruct dead actors

* add test

* fix typos

* remove debug print

* small fix

* fix typos

* Update test_actor.py
2019-07-15 10:25:09 -07:00
Hao Chen
7342117710
Fix a multithreading bug in grpc ClientCall () 2019-07-15 14:49:53 +08:00
Jones Wong
5b13a7eb90 Keep parameter space noise consistent with action space noise (Fix 5173) ()
*  make parameter space noise consistent with action space noise

*  modified according to lint check

*  indent
2019-07-14 12:20:35 -07:00
Philipp Moritz
322b5166ad Update arrow to include user defined status for plasma () 2019-07-12 22:51:14 -07:00
Hao Chen
f5a87b88a3 Fix: ServerCallFactory's destructor not marked as virtual () 2019-07-13 09:38:47 +08:00
Richard Liaw
b6509f46b0
Update wheels to 0.8.0dev2 () 2019-07-12 17:27:03 -07:00
Richard Liaw
1530389822
[tune] Fast Node Recovery () 2019-07-12 13:47:30 -07:00
Hao Chen
0ec3a16bbd
Fix Java MultithreadingTest () 2019-07-12 19:00:13 +08:00
Stephanie Wang
f46c555e9e Only get actor ID if actor task () 2019-07-12 14:31:21 +08:00
vipulharsh
3b42d5ccb1 Track newly created actor's parent actor ()
* Track parent actor of actor

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* Update src/ray/raylet/node_manager.cc

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

* fixing a comment

* Fixing typo in a comment

* capturing task_spec instead of actor_data

* adding const for some local variables

* changing an if else to else

* Linted version

* use updated method to create task from task_data

Change-Id: I9c1a65134dc23a2d175047e96b86ab9d9cf61971

* fixing linter issues

Change-Id: I1def06218130b399d2527b999258aecf9abb98dd
2019-07-11 14:52:04 -07:00
Kristian Hartikainen
3456afdea7 [autoscaler] Fix missing body argument in GCP getIamPolicy 2019-07-11 13:03:51 -07:00
Philipp Moritz
ccee77aafd fix node_failures.py () 2019-07-11 11:40:13 -07:00