Edward Oakes
b84fe56bed
Split test_basic to avoid timeouts in CI ( #8405 )
2020-05-12 10:18:21 -05:00
Hao Chen
a593fde606
Fix core dumps in ExitActor ( #8382 )
2020-05-12 20:06:04 +08:00
Sven Mika
57544b1ff9
[RLlib] Examples folder restructuring (Model examples; final part). ( #8278 )
...
- This PR completes any previously missing PyTorch Model counterparts to TFModels in examples/models.
- It also makes sure, all example scripts in the rllib/examples folder are tested for both frameworks and learn the given task (this is often currently not checked) using a --as-test flag in connection with a --stop-reward.
2020-05-12 08:23:10 +02:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala ( #8321 )
2020-05-11 20:24:43 -07:00
Sven Mika
c7cb2f5416
[RLlib] IMPALA PyTorch GPU fixes ( #8397 )
2020-05-11 22:03:27 +02:00
Edward Oakes
fdf0e5ceb1
Update README to say that python 2 is deprecated ( #8404 )
2020-05-11 14:49:49 -05:00
Jason McGhee
24ced808cd
Fix config key in docs for using PyTorch ( #8300 )
...
Docs improperly suggest using "torch" when the actual flag is called "use_pytorch"
2020-05-11 12:41:21 -07:00
Stephanie Wang
f97f466cec
Fix test ( #8391 )
2020-05-11 10:15:53 -07:00
mehrdadn
66b3edccb9
Prefer built-in system compilers over Clang download ( #8355 )
...
Co-authored-by: Mehrdad <noreply@github.com>
2020-05-11 11:53:35 -05:00
fangfengbin
515afa6809
Fix AsyncGetAll miss override bug ( #8402 )
...
Co-authored-by: 灵洵 <fengbin.ffb@antfin.com>
2020-05-11 11:08:16 -05:00
fangfengbin
8d0c1b5e06
GCS adapts to actor table pub sub ( #8347 )
2020-05-11 13:53:53 +08:00
Simon Mo
501b936114
[Serve] Improve error message when result is not a list ( #8378 )
2020-05-10 17:18:06 -07:00
Stephanie Wang
3a25f5f5b4
Clean up actor state from the GCS ( #8261 )
...
* parametrize test
* Regression test and logging
* Test no restart after actor deletion
* Unit tests
* Refactor to subscribe to and lookup from worker failure table
* Refactor ActorManager to remove dependencies
* Revert "Regression test and logging"
This reverts commit 835e1a9091b51ca8efb00392d4cc4a665145de24.
* Revert "parametrize test"
This reverts commit f31272082831ba1a494816dd5511d87b24eca4c9.
* Revert "Test no restart after actor deletion"
This reverts commit 114a83de14329aa6ab787c80cd5757cf074a9072.
* doc
* merge
* Revert "Refactor to subscribe to and lookup from worker failure table"
This reverts commit 6aa13a05178d0b9aa1db9dee5c978c911b74fa3a.
* Revert "Revert "Test no restart after actor deletion""
This reverts commit 1bd92d09172aa8ab42632551cf9c56463f9598fe.
* Revert "Revert "parametrize test""
This reverts commit 639ba4d3b02167fb2b05e9878f9aa600bcec95b3.
* Revert "Revert "Regression test and logging""
This reverts commit f18b5f0db699a23cbccde32789e3639425e99ca4.
* Clean up actors that have gone out of scope
* Use actor ID instead of shared_ptr
* Clean up actors owned by dead workers
* Use actor ID instead of shared_ptr
* TODO and lint
* Fix unit tests
* Add unit tests for supervision and docs
* xx
* Fix tests
* Fix tests
* fix build
2020-05-09 18:43:49 -07:00
Thomas Lecat
4421f3a000
[tune] Close loggers after updating trial ( #8307 ) ( #8366 )
2020-05-09 13:26:59 -07:00
Edward Oakes
2677b71003
Implement named actors using the GCS service ( #8328 )
2020-05-09 08:58:10 -05:00
Hao Chen
93138e617a
Fix a bad usage of std::move ( #8364 )
2020-05-09 14:24:24 +08:00
Eric Liang
1126fe4d23
[tune] Add UUID back to trial names ( #8377 )
2020-05-08 20:20:36 -07:00
fangfengbin
7fec602f2e
GCS adapts to node resource table pub sub ( #8305 )
2020-05-09 10:31:35 +08:00
A Kharitonov
304e31b7e5
Fixed: contrib/MADDPG MADDPGTFPolicy missing self.config assignment ( #8343 )
2020-05-08 12:05:06 -07:00
Sven Mika
754290daad
[RLlib] Add light-weight Trainer.compute_action()
tests for all Algos. ( #8356 )
2020-05-08 16:31:31 +02:00
Sven Mika
d946f58fd0
LINT fixes. ( #8370 )
2020-05-08 16:24:20 +02:00
gehring
7f14fb577d
[RLlib] Added TransformerXL and "stabilized for RL" variant, GTrXL ( #6470 )
2020-05-08 14:10:23 +02:00
Eric Liang
2c599dbf05
[rllib] Port QMIX, MADDPG to new execution API ( #8344 )
2020-05-07 23:41:10 -07:00
Eric Liang
9f04a65922
[rllib] Add PPO+DQN two trainer multiagent workflow example ( #8334 )
2020-05-07 23:40:29 -07:00
Sven Mika
d7eaacb5fe
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. ( #8324 )
2020-05-08 08:26:32 +02:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 ( #8353 )
2020-05-08 08:20:18 +02:00
Eric Liang
413db0902d
Trigger global GC when resources may be occupied by deleted actors
2020-05-07 14:57:21 -07:00
Edward Oakes
f2f118df9e
[serve] Clear serve cluster state between tests. ( #8357 )
2020-05-07 16:45:20 -05:00
Eric Liang
30db920787
[rllib] Fix centralized critic example to use right policy ( #8341 )
...
* update
* update
2020-05-07 10:47:55 -07:00
Philipp Moritz
325aec81bd
Hide aliased autoscaler commands ( #8348 )
2020-05-07 10:17:59 -07:00
Sven Mika
2b0817cbd3
[RLlib] Retry pip installs (after waiting n seconds) in install-dependencies.sh ( #8354 )
2020-05-07 17:39:35 +02:00
fangfengbin
dd3c050168
GCS adapts to batch heartbeat table pub sub ( #8346 )
2020-05-07 20:33:36 +08:00
fangfengbin
620ea94873
Fix node manager miss object info bug ( #8337 )
2020-05-07 20:16:42 +08:00
Eric Liang
bc8b606ad7
[rllib] All test suites show up as RLLIB_TESTING=1 only.
2020-05-06 23:11:13 -07:00
Simon Mo
c5a5a5de89
[Serve] Refactor Metric System: Counter + Measure Support ( #8114 )
2020-05-06 17:44:02 -07:00
Eric Liang
1f312debbe
Document all ray commands. ( #8340 )
2020-05-06 16:49:37 -07:00
SangBin Cho
e631827a9f
[Core] Show_webui segfault fix. ( #8323 )
2020-05-06 11:45:07 -05:00
Alex Wu
04813c2ef5
[Parallel Iterator] Foreach concur ( #8140 )
2020-05-06 10:00:01 -05:00
Thomas Desrosiers
ec9357b486
[autoscaler] Fix filesystem permission race conditions ( #8327 )
2020-05-05 17:22:03 -07:00
Eric Liang
b14cc16616
[rllib] Enable functional execution workflow API by default ( #8221 )
2020-05-05 12:36:42 -07:00
mehrdadn
4bdef78e2e
Various CI fixes and cleanup ( #8289 )
2020-05-05 10:47:49 -07:00
fangfengbin
97430b2d0f
GCS adapts to node table pub sub ( #8209 )
2020-05-05 18:34:41 +08:00
Eric Liang
ee0eb44a32
Rename async_queue_depth -> num_async ( #8207 )
...
* rename
* lint
2020-05-05 01:38:10 -07:00
Eric Liang
f48da50e1c
[rllib] observation function api for multi-agent ( #8236 )
2020-05-04 22:13:49 -07:00
Simon Mo
1480bf4295
[Serve] Improve batch size inconsistency error ( #8315 )
2020-05-04 20:32:12 -07:00
Simon Mo
ca929671b6
[Serve] Simplify Validation ( #8316 )
2020-05-04 20:31:23 -07:00
Rüdiger Busche
e93ec3134a
Use kubectl delete pod in example ( #8295 )
...
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:39:30 -05:00
Rüdiger Busche
5dd9dbf74f
Add ipython as dependency for autoscaler container ( #8297 )
...
Co-authored-by: rbusche <rbusche@inserve.de>
2020-05-04 21:22:38 -05:00
fangfengbin
14d03a0869
GCS adapts to task lease table pub sub ( #8299 )
2020-05-05 10:16:56 +08:00
ijrsvt
cc7bd6650a
[core] Enabling Remote Task Cancelation ( #8225 )
2020-05-04 15:24:22 -07:00