Zhijun Fu
eb307f93f8
Support direct actor call ( #5183 )
2019-07-30 17:47:17 +08:00
Simon Mo
196495a4de
Fix Redis Test ( #5302 )
2019-07-30 00:22:16 -07:00
micafan
b3bcf59148
Rename ClientTableData to GcsNodeInfo ( #5251 )
2019-07-30 11:22:47 +08:00
Simon Mo
3ba8680963
Bump version to 0.8.0.dev3 ( #5308 )
2019-07-29 18:28:38 -07:00
Michael Luo
1337c98f02
[rllib] Importance Sampling and KL Loss for APPO ( #5051 )
2019-07-29 15:02:32 -07:00
Simon Mo
3b00144e7d
Bump version to 0.7.3 ( #5301 )
2019-07-29 10:25:32 -07:00
Eric Liang
3bdd114282
[rllib] Better example rnn envs ( #5300 )
2019-07-28 14:07:18 -07:00
Qing Wang
1465a30ea9
Fix releasing CPUs incorrectly when actor creation task blocked. ( #5271 )
...
* Fix
* Remove useless log
* Address
* Fix typo
* sleep
2019-07-28 15:46:17 +08:00
Richard Liaw
5ea859dc73
[sgd] hotfix example failure ( #5297 )
...
* hotfix
* Update train_example.py
2019-07-27 18:13:22 -07:00
Eric Liang
6f2c5b2819
Revert "[autoscaler] Clean up error messages on setup failure ( #5210 )" ( #5299 )
...
This reverts commit 7fc15dbf7f
.
2019-07-27 16:53:47 -07:00
lanlin
341dbf6c45
[tune] support nested dictionaries for CSVLogger ( #5295 )
2019-07-27 14:44:34 -07:00
Richard Liaw
b4823d63c6
[autoscaler] Local YAML readability ( #5290 )
2019-07-27 12:51:50 -07:00
LorenzoCevolani
10cbcced7e
Correctly setting the input to Train ( #3853 )
...
In the ResNetTrainActor class, the data are now exactly build using the Train flag for the cifar_input script.
2019-07-27 11:08:35 -07:00
Eric Liang
a62c5f40f6
[rllib] Document ModelV2 and clean up the models/ directory ( #5277 )
2019-07-27 02:08:16 -07:00
Richard Liaw
9c00616cdc
Retry and exception for hang on memory store full ( #5143 )
2019-07-27 01:20:13 -07:00
Richard Liaw
5e15b36d6e
[tune] experiment_analysis split to Analysis ( #5115 )
2019-07-27 01:10:52 -07:00
Richard Liaw
7e715520e5
[sgd] Example for Training ( #5292 )
2019-07-27 01:10:25 -07:00
Daniel Edgecumbe
06fec63c87
[autoscaler] Add a 'request_cores' function for manual autoscaling ( #4754 )
2019-07-26 17:14:45 -07:00
lanlin
d9e81da3b8
[tune] configurable maximum length of trial identifier ( #5287 )
2019-07-26 17:09:54 -07:00
Hao Chen
6f737e6a50
Add CODEOWNERS file ( #5259 )
2019-07-26 12:40:07 +08:00
Antoine Galataud
827618254a
[rllib] Configure learner queue timeout ( #5270 )
...
* configure learner queue timeout
* lint
* use config
* fix method args order, add unit test
* fix wrong param name
2019-07-25 21:18:05 -07:00
micafan
6f682db99d
avoid copying ActorTableData when NodeMananger updates an actor to GCS ( #5244 )
2019-07-26 11:17:24 +08:00
Stephanie Wang
3321555975
Increase timeout for ray.wait
test ( #5273 )
...
* Increase test timeout for ray.wait
* make sure the actor is scheduled
2019-07-25 14:23:46 -07:00
Eric Liang
bf9199ad77
[rllib] ModelV2 support for pytorch ( #5249 )
2019-07-25 11:02:53 -07:00
Joey Jiang
40395acadf
[gRPC] Migrate raylet client implementation to grpc ( #5120 )
2019-07-25 14:48:56 +08:00
Eric Liang
60f59639c1
[rllib] Port DDPG to the build_tf_policy pattern ( #5242 )
2019-07-24 13:55:55 -07:00
Eric Liang
690b374581
[rllib] Add Keras LSTM example with ModelV2 ( #5258 )
2019-07-24 13:09:41 -07:00
Eric Liang
5b76238bce
Fix two types of eviction hangs ( #5225 )
2019-07-23 21:20:17 -07:00
Eric Liang
97c43284a6
[rllib] Fix trainer state restore ( #5257 )
2019-07-23 21:18:58 -07:00
Stephanie Wang
9c651f47bb
Add regression test for actor load balancing ( #5224 )
...
* Add regression test for actor load balancing
* Increase timeout
* Reduce number of nodes?
2019-07-23 15:11:55 -07:00
Stephanie Wang
15959b0f0d
Leave ray.wait
calls open until the task or actor exits ( #5234 )
...
* Regression test
* Split TaskDependencyManager::SubscribeDependencies into ray.get and ray.wait dependencies
- Some initial implementation
* unit test
* Improve unit tests for TaskDependencyManager
* Implement SubscribeWaitDependencies and UnsubscribeWaitDependencies, unit tests passing
* Add ray.wait python test for drivers that exit early
* Add WorkerID to Worker
* Update test to use two nodes
* Regression test for ray.wait passes
* Extend regression test to include ray.wait from an actor
* Fix ClientID and WorkerIDs
* lint
* lint
* Remove unnecessary ray_get argument
* fix build
2019-07-23 11:55:28 -07:00
Qing Wang
a3d4f9f16d
Fix the issue when passing multiple options in one string ( #5241 )
...
* Fix
* Fix linting
* Fix linting
* Address
* Fix test
2019-07-23 12:28:54 +08:00
Peter Schafhalter
fc589050c9
[sgd] Deprecate old distributed SGD implementation ( #5160 )
...
* Deprecate old distributed SGD implementation
* Update README
2019-07-22 15:47:10 -07:00
Vince Jankovics
80b976efcb
Ray namespace added for k8s ( #4111 )
...
* Ray namespace added for k8s
* Submit.yaml update with k8s namespace
* K8s deployment doc update with namespace
2019-07-22 15:45:05 -07:00
Richard Liaw
7fc15dbf7f
[autoscaler] Clean up error messages on setup failure ( #5210 )
2019-07-22 11:27:51 -07:00
Richard Liaw
53fb876a5f
Improved KeyboardInterrupt Exception Handling ( #5237 )
2019-07-22 02:29:56 -07:00
Eric Liang
f9043cc49a
[rllib] Remove experimental eager support
2019-07-21 12:27:17 -07:00
Richard Liaw
b0c0de49a2
[tune] Fixup exception messages ( #5238 )
2019-07-20 22:36:27 -07:00
Eric Liang
d58b986858
[rllib] MultiCategorical shouldn't return array for kl or entropy ( #5215 )
...
* wip
* fix
2019-07-19 12:12:04 -07:00
Jones Wong
da7676c925
Removed the implicit sync barrier at the end of each training iteration ( #5217 )
...
* removed sync barrier at the end of each training iteration
* formatted
* modify the comment according to current semantics
* lint check
* Update trainer.py
2019-07-18 22:59:52 -07:00
Eric Liang
28e5c5555d
[rllib] Move some inline defs to avoid deserialization errors ( #5228 )
...
* fix bug
* move metrics too
2019-07-18 21:01:16 -07:00
Zhijun Fu
aa42328874
[direct call] add local plasma provider ( #5184 )
2019-07-19 11:29:12 +08:00
micafan
b5b8c1d361
[GCS] introduce new gcs client and refactor actor table ( #5058 )
2019-07-19 11:28:34 +08:00
Jones Wong
0af07bd493
Enable seeding actors for reproducible experiments ( #5197 )
...
* enable graph-level worker-specific seed
* lint checked
* revised according to eric's suggestions
* revised accordingly and added a test case
* formated
* Update test_reproducibility.py
* Update trainer.py
* Update rollout_worker.py
* Update run_rllib_tests.sh
* Update worker_set.py
2019-07-17 23:31:34 -07:00
Qingqing Mao
63f49f95dd
Improve memory check ( #5216 )
...
* Improve MemoryMonitor
- Add an env var to control the threshold.
- Use cgroup memory limit and usage for container environment.
* linting
* white space
* add comment
2019-07-17 23:30:02 -07:00
Jones Wong
81d297f87e
Remove redundant scaler of l2 reg ( #5172 )
...
* remove redundant scaler of l2 reg
* lint formatted
* Update ddpg_policy.py
2019-07-17 15:11:27 -07:00
Jones Wong
ae03c42dd6
Fixed inconsistent action placeholder ( #5213 )
2019-07-17 10:55:14 -07:00
Sam Toyer
214f09d969
[rllib] Make RLLib handle zero-length observation arrays ( #5208 )
...
* [rllib] Make _summarize handle zero-len arrays
Fixes #5207
* [rllib] Make aligned_array() handle empty arrays
* [rllib] Conform with old yapf
2019-07-16 22:37:57 -07:00
Richard Liaw
3e0ad11ae0
Add heartbeat test + Fix monitor.py ( #5191 )
2019-07-16 21:59:48 -07:00
Eric Liang
4fa2a6006c
[rllib] Remove nested import ( #5204 )
...
* remove nested import
* Update metrics.py
2019-07-16 10:52:56 -07:00