Commit graph

306 commits

Author SHA1 Message Date
Eric Liang
831b2fe51d
[rllib] Set framework to tf by default and remove import checks; "Auto" option (#8748)
* tf by default

* Update rllib/agents/trainer.py

Co-authored-by: Sven Mika <sven@anyscale.io>

* remove it

* fix

* remove

* fix

* lint

Co-authored-by: Sven Mika <sven@anyscale.io>
2020-06-08 23:04:50 -07:00
Sven Mika
25c0974543
[RLlib] Issue 8412 (Adam vars not stored in ModelV2). (#8480) 2020-06-05 21:07:02 +02:00
Sven Mika
c74dc58f8b
[RLlib] Fix use_lstm flag for ModelV2 (w/o ModelV1 wrapping) and add it for PyTorch. (#8734) 2020-06-05 15:40:30 +02:00
Sven Mika
97d524c075
[RLlib] Issue 8769 broken OOM tests_dir cases (R & S). (#8770) 2020-06-05 08:34:21 +02:00
Eric Liang
1e4a1360fd
[rllib] Add type annotations to Trainer class (#8642)
* type trainer

* type it

* fxi
2020-06-03 12:47:35 -07:00
Sven Mika
b37a162076
[RLlib] Make envs specifiable in configs by their class path. (#8750) 2020-06-03 08:14:29 +02:00
Sven Mika
d8a081a185
[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590) 2020-05-30 22:48:34 +02:00
Sven Mika
d483ed28ba
[RLlib] Fix broken tune tests in master due to framework=auto errors. (#8672) 2020-05-29 11:55:47 +02:00
Tomasz Wrona
f266318a01
[rllib] Do not store torch tensors when using grad clipping (#8509) 2020-05-28 12:06:27 -07:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Sven Mika
c7a2e3f309
[RLlib] Removed config["sample_async"] restriction for A3C-torch. (#8617) 2020-05-27 10:22:49 +02:00
Sven Mika
6d196197bc
[RLlib] utils/spaces ... (#8608) 2020-05-27 10:21:30 +02:00
Sven Mika
baa053496a
[RLlib] Benchmark and regression test yaml cleanup and restructuring. (#8414) 2020-05-26 11:10:27 +02:00
Jan Blumenkamp
d6f78f58dc
Fix missing learning rate and entropy coeff schedule for torch PPO (#8572) 2020-05-23 10:54:18 -07:00
Sven Mika
8870270164
[RLlib] Add QMIX support for complex obs spaces (Issue 8523). (#8533) 2020-05-22 10:17:51 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Sven Mika
d76578700d
[RLlib] Policy.compute_single_action() broken for nested actions (Issue 8411). (#8514) 2020-05-20 22:29:08 +02:00
mehrdadn
ebf060d484
Make more tests run on Windows (#8446)
* Remove worker Wait() call due to SIGCHLD being ignored

* Port _pid_alive to Windows

* Show PID as well as TID in glog

* Update TensorFlow version for Python 3.8 on Windows

* Handle missing Pillow on Windows

* Work around dm-tree PermissionError on Windows

* Fix some lint errors on Windows with Python 3.8

* Simplify torch requirements

* Quiet git clean

* Handle finalizer issues

* Exit with the signal number

* Get rid of wget

* Fix some Windows compatibility issues with tests

Co-authored-by: Mehrdad <noreply@github.com>
2020-05-20 12:25:04 -07:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex (#8396) 2020-05-20 11:22:30 -07:00
Sven Mika
796a834c48
[RLlib] Attention Net integration into ModelV2 and learning RL example. (#8371) 2020-05-18 17:26:40 +02:00
Eric Liang
96f4d82cc3
[rllib] Qmix replay ratio is wrong 2020-05-12 13:07:19 -07:00
Eric Liang
7ce138a6dc
[rllib] Support free_log_std in ModelV2 (#8380)
* update

* factor

* update

* fix test failures

* fix torch net
2020-05-12 10:14:05 -07:00
Sven Mika
57544b1ff9
[RLlib] Examples folder restructuring (Model examples; final part). (#8278)
- This PR completes any previously missing PyTorch Model counterparts to TFModels in examples/models.
- It also makes sure, all example scripts in the rllib/examples folder are tested for both frameworks and learn the given task (this is often currently not checked) using a --as-test flag in connection with a --stop-reward.
2020-05-12 08:23:10 +02:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Sven Mika
c7cb2f5416
[RLlib] IMPALA PyTorch GPU fixes (#8397) 2020-05-11 22:03:27 +02:00
Sven Mika
754290daad
[RLlib] Add light-weight Trainer.compute_action() tests for all Algos. (#8356) 2020-05-08 16:31:31 +02:00
Eric Liang
2c599dbf05
[rllib] Port QMIX, MADDPG to new execution API (#8344) 2020-05-07 23:41:10 -07:00
Eric Liang
9f04a65922
[rllib] Add PPO+DQN two trainer multiagent workflow example (#8334) 2020-05-07 23:40:29 -07:00
Sven Mika
d7eaacb5fe
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Eric Liang
b14cc16616
[rllib] Enable functional execution workflow API by default (#8221) 2020-05-05 12:36:42 -07:00
Eric Liang
ee0eb44a32
Rename async_queue_depth -> num_async (#8207)
* rename

* lint
2020-05-05 01:38:10 -07:00
Eric Liang
f48da50e1c
[rllib] observation function api for multi-agent (#8236) 2020-05-04 22:13:49 -07:00
Sven Mika
6c2b9a4cfa
[RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
2020-05-04 23:53:38 +02:00
Sven Mika
a00144f746
[RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302) 2020-05-04 22:27:30 +02:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288)
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Sven Mika
166bb5d690
[RLlib] IMPALA PyTorch (#8287)
This PR adds an IMPALA PyTorch implementation.

- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Sven Mika
76e1a4df9e
Fix TD3 torch via GaussianNoise torch bug. (#8276) 2020-05-02 08:12:21 +02:00
Sven Mika
42991d723f
[RLlib] rllib/examples folder restructuring (#8250)
Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well).
2020-05-01 22:59:34 +02:00
Sven Mika
eea75ac623
[RLlib] Beta distribution. (#8229) 2020-04-30 11:09:33 -07:00
Eric Liang
baadbdf8d4
[rllib] Execute PPO using training workflow (#8206)
* wip

* add kl

* kl

* works now

* doc update

* reorg

* add ddppo

* add stats

* fix fetch

* comment

* fix learner stat regression

* test fixes

* fix test
2020-04-30 01:18:09 -07:00
Sven Mika
bf25aee392
[RLlib] Deprecate all Model(v1) usage. (#8146)
Deprecate all Model(v1) usage.
2020-04-29 12:12:59 +02:00
Sven Mika
eb91619175
Fix release 0.8.5 tests for PPO torch Breakout. (#8226) 2020-04-29 10:36:41 +02:00
Sven Mika
1775e89f26
[RLlib] Remove TupleActions and support arbitrarily nested action spaces. (#8143)
Deprecate TupleActions and support arbitrarily nested action spaces.
Closes issue #8143.
2020-04-28 14:59:16 +02:00
Sven Mika
7ec2223c84
[RLlib] DDPG PyTorch actor-model was missing sigmoid layer (#8188)
Fix DDPG PyTorch (missing sigmoid layer (to squash action outputs) after deterministic action outputs).
2020-04-26 23:08:13 +02:00
Eric Liang
2298f6fb40
[rllib] Port DQN/Ape-X to training workflow api (#8077) 2020-04-23 12:39:19 -07:00
Sven Mika
499ad5fbe4
[RLlib] PyTorch version of APPO. (#8120)
- Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases.
- Add learning test cases for APPO torch (both w/ and w/o v-trace).
- Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).
2020-04-23 09:11:12 +02:00
Sven Mika
d15609ba2a
[RLlib] PyTorch version of ARS (Augmented Random Search). (#8106)
This PR implements a PyTorch version of RLlib's ARS algorithm using RLlib's functional algo builder API. It also adds a regression test for ARS (torch) on CartPole.
2020-04-21 09:47:52 +02:00
Sven Mika
3812bfedda
[RLlib] PyTorch version of ES (Evolution Strategies). (#8104)
PyTorch version of Evolution Strategies (ES) Algo.
2020-04-20 21:47:28 +02:00
Sven Mika
d6cb7d865e
[RLlib] Torch DQN (APEX) TD-Error/prio. replay fixes. (#8082)
PyTorch APEX_DQN with Prioritized Replay enabled would not work properly due to the td_error not being retrievable by the AsyncReplayOptimizer.
2020-04-20 10:03:25 +02:00