Sven Mika
eb0038612f
[RLlib] Extend on_learn_on_batch callback to allow for custom metrics to be added. ( #13584 )
2021-02-08 15:02:19 +01:00
Sven Mika
52c94b7ee9
[RLlib] Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. ( #13522 )
2021-02-02 13:05:58 +01:00
Sven Mika
2e3655e8a9
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. ( #13238 )
2021-01-19 14:22:36 +01:00
Sven Mika
56878221ed
[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). ( #13363 )
2021-01-14 14:44:33 +01:00
Kai Fricke
25f10a947a
Revert "[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. ( #13339 )" ( #13361 )
...
This reverts commit e2b2abb88b
.
2021-01-12 12:33:57 +01:00
Sven Mika
e2b2abb88b
[RLlib] Make TFModelV2 behave more like TorchModelV2: Obsolete register_variables. Unify variable dicts. ( #13339 )
2021-01-11 22:42:30 +01:00
Sven Mika
8726521604
[RLlib] JAXPolicy prep PR #2 (move get_activation_fn (backward-compatibly), minor fixes and preparations). ( #13091 )
2020-12-30 22:30:52 -05:00
Michael Luo
42cd414e5b
[RLlib] New Offline RL Algorithm: CQL (based on SAC) ( #13118 )
2020-12-30 10:11:57 -05:00
Sven Mika
99ae7bae05
[RLlib] JAXPolicy prep. PR #1 . ( #13077 )
2020-12-26 20:14:18 -05:00
Michael Luo
4bcd475671
[RLlib] Improved Documentation for PPO, DDPG, and SAC ( #12943 )
2020-12-24 09:31:35 -05:00
Sven Mika
19c8033df2
[RLlib] Fix most remaining RLlib algos for running with trajectory view API. ( #12366 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT and fixes.
MB-MPO and MAML not working yet.
* wip
* update
* update
* rmeove
* remove dep
* higher
* Update requirements_rllib.txt
* Update requirements_rllib.txt
* relpos
* no mbmpo
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Sven Mika
0df55a139c
[RLlib] Attention Net prep PR #1 : Smaller cleanups. ( #12447 )
...
* WIP.
* Fix.
* Fix.
* Fix.
2020-11-27 16:25:47 -08:00
Sven Mika
b7dbbfbf41
[RLlib] Issue 11591: SAC loss does not use PR-weights in critic loss term. ( #12394 )
...
* WIP.
* Fix and LINT.
2020-11-25 11:28:46 -08:00
Sven Mika
b6b54f1c81
[RLlib] Trajectory view API: enable by default for SAC, DDPG, DQN, SimpleQ ( #11827 )
2020-11-16 10:54:35 -08:00
Sven Mika
62c7ab5182
[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). ( #11747 )
2020-11-12 16:27:34 +01:00
Sven Mika
291c172d83
[RLlib] Support Simplex action spaces for SAC (torch and tf). ( #11909 )
2020-11-11 18:45:28 +01:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master ( #11897 )
...
* use master link
* remae
* revert non-ray
* more
* mre
2020-11-10 10:53:28 -08:00
Sven Mika
d9f1874e34
[RLlib] Minor fixes (torch GPU bugs + some cleanup). ( #11609 )
2020-10-27 10:00:24 +01:00
Sven Mika
f5e2cda68a
[RLlib] SAC: log_alpha not being learnt when on GPU. ( #11298 )
2020-10-12 13:48:44 -07:00
Julius Frost
7dcfd258cd
[RLlib] Assert LongTensor in SAC Discrete PyTorch ( #11245 )
2020-10-12 13:47:21 -07:00
Sven Mika
ce96b03b07
[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). ( #11033 )
2020-10-06 20:28:16 +02:00
Sven Mika
c17169dc11
[RLlib] Fix all example scripts to run on GPUs. ( #11105 )
2020-10-02 23:07:44 +02:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. ( #10825 )
2020-09-20 11:27:02 +02:00
maxco2
b8436f0f00
[rllib] Fix SAC and DDPG tensorflow policy can't do grad_clip
( #10499 )
...
* Fix sac_tf_policy clip_by_norm missing argument
* Fix ddpg_tf_policy clip_by_norm missing argument
* Fix format
2020-09-11 12:04:44 -07:00
Sven Mika
28ab797cf5
[RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). ( #10544 )
2020-09-06 10:58:00 +02:00
Sven Mika
8a891b3c30
[RLlib] SAC n_step > 1. ( #10567 )
2020-09-05 22:26:42 +02:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all
script & backfill lint fixes ( #9956 )
2020-08-07 16:49:49 -07:00
Sven Mika
fcdf410ae1
[RLlib] Tf2.x native. ( #8752 )
2020-07-11 22:06:35 +02:00
Sven Mika
4da0e542d5
[RLlib] DDPG and SAC eager support (preparation for tf2.x) ( #9204 )
2020-07-08 16:12:20 +02:00
Piotr Januszewski
155cc81e40
Clarify training intensity configuration docstring ( #9244 ) ( #9306 )
2020-07-05 20:07:27 -07:00
Sven Mika
43043ee4d5
[RLlib] Tf2x preparation; part 2 (upgrading try_import_tf()
). ( #9136 )
...
* WIP.
* Fixes.
* LINT.
* WIP.
* WIP.
* Fixes.
* Fixes.
* Fixes.
* Fixes.
* WIP.
* Fixes.
* Test
* Fix.
* Fixes and LINT.
* Fixes and LINT.
* LINT.
2020-06-30 10:13:20 +02:00
Sven Mika
5c6d5d4ab1
This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). ( #8970 )
2020-06-27 20:50:01 +02:00
Sven Mika
4fd8977eaf
[RLlib] Minor cleanup in preparation to tf2.x support. ( #9130 )
...
* WIP.
* Fixes.
* LINT.
* Fixes.
* Fixes and LINT.
* WIP.
2020-06-25 19:01:32 +02:00
Sven Mika
7008902cff
[RLlib] Minor rllib.utils
cleanup. ( #8932 )
2020-06-16 08:52:20 +02:00
Sven Mika
4ed796a7d6
[RLlib] Add testing Policy.compute_single_action()
for all agents. ( #8903 )
2020-06-13 17:51:50 +02:00
Sven Mika
0ba7472da9
[Testing] Fix LINT/sphinx errors. ( #8874 )
2020-06-10 15:41:59 +02:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch
in favor of framework=...
( #8520 )
2020-05-27 16:19:13 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers ( #8345 )
2020-05-21 10:16:18 -07:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex ( #8396 )
2020-05-20 11:22:30 -07:00
Sven Mika
796a834c48
[RLlib] Attention Net integration into ModelV2 and learning RL example. ( #8371 )
2020-05-18 17:26:40 +02:00
Sven Mika
754290daad
[RLlib] Add light-weight Trainer.compute_action()
tests for all Algos. ( #8356 )
2020-05-08 16:31:31 +02:00
Eric Liang
b14cc16616
[rllib] Enable functional execution workflow API by default ( #8221 )
2020-05-05 12:36:42 -07:00
Sven Mika
eea75ac623
[RLlib] Beta distribution. ( #8229 )
2020-04-30 11:09:33 -07:00
Sven Mika
7ec2223c84
[RLlib] DDPG PyTorch actor-model was missing sigmoid layer ( #8188 )
...
Fix DDPG PyTorch (missing sigmoid layer (to squash action outputs) after deterministic action outputs).
2020-04-26 23:08:13 +02:00
Sven Mika
f7e4dae852
[RLlib] DQN and SAC Atari benchmark fixes. ( #7962 )
...
* Add Atari SAC-discrete (learning MsPacman in 40k ts up to 780 rewards).
* SAC loss function test case fix.
2020-04-17 08:49:15 +02:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. ( #7953 )
...
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.
* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) ( #7984 )
...
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
(from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.
* Fix `clip_action` import from Policy (should probably be moved into utils altogether).
* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).
* Add `config` to c'tor call to TFPolicy.
* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.
* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).
* Fix LINT errors in Policy classes.
* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.
* policy.py LINT errors.
* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).
* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.
* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).
* QMixPolicy add empty implementations of abstract Policy methods.
* Store Policy's config in self.config in base Policy c'tor.
* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).
* Cartpole tf learning.
* Cartpole tf AND torch learning (in ~ same ts).
* Cartpole tf AND torch learning (in ~ same ts). 2
* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3
* Cartpole tf AND torch learning (in ~ same ts). 4
* Cartpole tf AND torch learning (in ~ same ts). 5
* Cartpole tf AND torch learning (in ~ same ts). 6
* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.
* WIP.
* WIP.
* SAC torch learning Pendulum.
* WIP.
* SAC torch and tf learning Pendulum and Cartpole after cleanup.
* WIP.
* LINT.
* LINT.
* SAC: Move policy.target_model to policy.device as well.
* Fixes and cleanup.
* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).
* Fixes and LINT.
* Fixes and LINT.
* Fix and LINT.
* WIP.
* Test fixes and LINT.
* Fixes and LINT.
Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. ( #7934 )
2020-04-09 14:04:21 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. ( #7852 )
...
* Add testing framework_iterator.
* LINT.
* WIP.
* Fix and LINT.
* LINT fix.
2020-04-03 12:24:25 -07:00