Commit graph

84 commits

Author SHA1 Message Date
Sven Mika
d6c7c7c675
[RLlib] Make sure, DQN torch actions are of type=long before torch.nn.functional.one_hot() op. (#11800) 2020-11-04 18:04:03 +01:00
Sven Mika
d9f1874e34
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00
Sumanth Ratna
9da7bdcc8e
Use master for links to docs in source (#10866) 2020-09-19 00:30:45 -07:00
desktable
4ccfd07a61
[RLlib] Add docstrings for agents/dqn (#10710) 2020-09-15 12:37:07 +02:00
desktable
799318d7d7
[RLlib] Add type annotations for agents/dqn (#10626) 2020-09-09 18:55:26 +02:00
Sven Mika
28ab797cf5
[RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). (#10544) 2020-09-06 10:58:00 +02:00
Sven Mika
8a891b3c30
[RLlib] SAC n_step > 1. (#10567) 2020-09-05 22:26:42 +02:00
Olli Huotari
0dae50b5eb
Fixed num_atoms>1 in pytorch (#10330) 2020-08-25 23:10:20 -07:00
Eric Liang
deea1861ab
[rllib] Try fixing torch GPU and masking errors (#10168) 2020-08-25 18:34:19 -07:00
Raphael Avalos
8b704eb419
Small fix for Cuda Torch DQN. (#10177) 2020-08-19 13:28:05 -07:00
Sven Mika
2256047876
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114) 2020-08-15 13:24:22 +02:00
Sven Mika
66d204e078
[RLlib] Model documentation enhancements. (#10011) 2020-08-13 13:36:40 +02:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00
Sven Mika
5dc4b6686e
[RLlib] Implement DQN PyTorch distributional head. (#9589) 2020-07-25 09:29:24 +02:00
Eric Liang
5acd3e66dd
[rllib] Fix torch TD error, IMPALA LR updates (#9477)
* update

* add test

* lint

* fix super call

* speed es test up
2020-07-23 12:50:25 -07:00
Sven Mika
78dfed2683
[RLlib] Issue 8384: QMIX doesn't learn anything. (#9527) 2020-07-17 12:14:34 +02:00
Sven Mika
935d8308fb
[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497) 2020-07-16 14:55:50 +02:00
Sven Mika
fcdf410ae1
[RLlib] Tf2.x native. (#8752) 2020-07-11 22:06:35 +02:00
Sven Mika
14160ca58c
[RLlib] Issue #9366 (DQN w/o dueling produces invalid actions). (#9386) 2020-07-10 12:43:03 +02:00
Sven Mika
01125b8fcf
[RLlib] DQN rainbow eager-mode (keras style NoisyLayer) (preparation for native tf2.x support). (#9304) 2020-07-09 10:44:10 +02:00
Piotr Januszewski
155cc81e40
Clarify training intensity configuration docstring (#9244) (#9306) 2020-07-05 20:07:27 -07:00
Sven Mika
43043ee4d5
[RLlib] Tf2x preparation; part 2 (upgrading try_import_tf()). (#9136)
* WIP.

* Fixes.

* LINT.

* WIP.

* WIP.

* Fixes.

* Fixes.

* Fixes.

* Fixes.

* WIP.

* Fixes.

* Test

* Fix.

* Fixes and LINT.

* Fixes and LINT.

* LINT.
2020-06-30 10:13:20 +02:00
Sven Mika
4fd8977eaf
[RLlib] Minor cleanup in preparation to tf2.x support. (#9130)
* WIP.

* Fixes.

* LINT.

* Fixes.

* Fixes and LINT.

* WIP.
2020-06-25 19:01:32 +02:00
Eric Liang
1e0e1a45e6
[rllib] Add type annotations for evaluation/, env/ packages (#9003) 2020-06-19 13:09:05 -07:00
Sven Mika
7008902cff
[RLlib] Minor rllib.utils cleanup. (#8932) 2020-06-16 08:52:20 +02:00
Sven Mika
4ed796a7d6
[RLlib] Add testing Policy.compute_single_action() for all agents. (#8903) 2020-06-13 17:51:50 +02:00
Eric Liang
34bae27ac7
[rllib] Flexible multi-agent replay modes and replay_sequence_length (#8893) 2020-06-12 20:17:27 -07:00
Sven Mika
0ba7472da9
[Testing] Fix LINT/sphinx errors. (#8874) 2020-06-10 15:41:59 +02:00
Sven Mika
d8a081a185
[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590) 2020-05-30 22:48:34 +02:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex (#8396) 2020-05-20 11:22:30 -07:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Sven Mika
754290daad
[RLlib] Add light-weight Trainer.compute_action() tests for all Algos. (#8356) 2020-05-08 16:31:31 +02:00
Eric Liang
2c599dbf05
[rllib] Port QMIX, MADDPG to new execution API (#8344) 2020-05-07 23:41:10 -07:00
Eric Liang
9f04a65922
[rllib] Add PPO+DQN two trainer multiagent workflow example (#8334) 2020-05-07 23:40:29 -07:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Eric Liang
b14cc16616
[rllib] Enable functional execution workflow API by default (#8221) 2020-05-05 12:36:42 -07:00
Eric Liang
ee0eb44a32
Rename async_queue_depth -> num_async (#8207)
* rename

* lint
2020-05-05 01:38:10 -07:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288)
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Eric Liang
2298f6fb40
[rllib] Port DQN/Ape-X to training workflow api (#8077) 2020-04-23 12:39:19 -07:00
Sven Mika
d6cb7d865e
[RLlib] Torch DQN (APEX) TD-Error/prio. replay fixes. (#8082)
PyTorch APEX_DQN with Prioritized Replay enabled would not work properly due to the td_error not being retrievable by the AsyncReplayOptimizer.
2020-04-20 10:03:25 +02:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. (#7953)
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.

* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) (#7984)
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
  (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.

* Fix `clip_action` import from Policy (should probably be moved into utils altogether).

* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).

* Add `config` to c'tor call to TFPolicy.

* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.

* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).

* Fix LINT errors in Policy classes.

* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.

* policy.py LINT errors.

* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).

* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.

* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).

* QMixPolicy add empty implementations of abstract Policy methods.

* Store Policy's config in self.config in base Policy c'tor.

* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).

* Cartpole tf learning.

* Cartpole tf AND torch learning (in ~ same ts).

* Cartpole tf AND torch learning (in ~ same ts). 2

* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3

* Cartpole tf AND torch learning (in ~ same ts). 4

* Cartpole tf AND torch learning (in ~ same ts). 5

* Cartpole tf AND torch learning (in ~ same ts). 6

* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.

* WIP.

* WIP.

* SAC torch learning Pendulum.

* WIP.

* SAC torch and tf learning Pendulum and Cartpole after cleanup.

* WIP.

* LINT.

* LINT.

* SAC: Move policy.target_model to policy.device as well.

* Fixes and cleanup.

* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).

* Fixes and LINT.

* Fixes and LINT.

* Fix and LINT.

* WIP.

* Test fixes and LINT.

* Fixes and LINT.

Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
Eric Liang
31b40b00f6
[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests (#7958) 2020-04-10 00:56:08 -07:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. (#7934) 2020-04-09 14:04:21 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. (#7597)
* Fix.

* Rollback.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix.

* Fix.

* Fix.

* WIP.

* WIP.

* Fix.

* Test case fixes.

* Test case fixes and LINT.

* Test case fixes and LINT.

* Rollback.

* WIP.

* WIP.

* Test case fixes.

* Fix.

* Fix.

* Fix.

* Add regression test for DQN w/ param noise.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Comment

* Regression test case.

* WIP.

* WIP.

* LINT.

* LINT.

* WIP.

* Fix.

* Fix.

* Fix.

* LINT.

* Fix (SAC does currently not support eager).

* Fix.

* WIP.

* LINT.

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* WIP.

* Fix.

* LINT.

* LINT.

* Fix and LINT.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Fix.

* Fix and LINT.

* Update rllib/utils/exploration/exploration.py

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Fixes.

* WIP.

* LINT.

* Fixes and LINT.

* LINT and fixes.

* LINT.

* Move action_dist back into torch extra_action_out_fn and LINT.

* Working SimpleQ learning cartpole on both torch AND tf.

* Working Rainbow learning cartpole on tf.

* Working Rainbow learning cartpole on tf.

* WIP.

* LINT.

* LINT.

* Update docs and add torch to APEX test.

* LINT.

* Fix.

* LINT.

* Fix.

* Fix.

* Fix and docstrings.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Tune param-noise tests.

* LINT.

* Fix.

* Fix.

* test

* test

* test

* Fix.

* Fix.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT.

* WIP.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Sven Mika
82c2d9faba
[RLlib] Fix broken RLlib tests in master. (#7894) 2020-04-05 09:34:23 -07:00
Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. (#7852)
* Add testing framework_iterator.

* LINT.

* WIP.

* Fix and LINT.

* LINT fix.
2020-04-03 12:24:25 -07:00