Sven Mika
2aec77e305
[RLlib] Fix two test cases that only fail on Travis. ( #11435 )
2020-10-16 13:53:30 -05:00
Sven Mika
414041c6dd
[RLlib] Do not create env on driver iff num_workers > 0. ( #11307 )
2020-10-15 18:21:30 +02:00
Sven Mika
8ea1bc5ff9
[RLlib] Allow for more than 2^31 policy timesteps. ( #11301 )
2020-10-12 13:49:11 -07:00
Sven Mika
199e5d0f75
[RLlib] Exploration class type annotations. ( #11251 )
2020-10-07 21:59:14 +02:00
Sven Mika
ce96b03b07
[RLlib] MB-MPO cleanup (comments, docstrings, type annotations). ( #11033 )
2020-10-06 20:28:16 +02:00
Sven Mika
c17169dc11
[RLlib] Fix all example scripts to run on GPUs. ( #11105 )
2020-10-02 23:07:44 +02:00
Sven Mika
f91c455527
[RLlib] Curiosity documentation. ( #11066 )
2020-09-29 09:39:22 +02:00
Thomas Lecat
504da45e69
fix(rllib): allow explore=False with tuple action distributions ( #10443 )
2020-09-10 15:03:02 -07:00
Sven Mika
244aafdcf8
[RLlib] Curiosity enhancements. ( #10373 )
2020-09-05 13:14:24 +02:00
Sven Mika
715ee8dfc9
[RLlib] Issue 10469: Callbacks should receive env idx ... ( #10477 )
2020-09-03 17:27:05 +02:00
Eric Liang
deea1861ab
[rllib] Try fixing torch GPU and masking errors ( #10168 )
2020-08-25 18:34:19 -07:00
Sven Mika
2cbe29a7fa
[RLlib] Curiosity minor fixes, do-overs, and testing. ( #10143 )
2020-08-19 17:49:50 +02:00
Sven Mika
2256047876
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. ( #10114 )
2020-08-15 13:24:22 +02:00
Tanay Wakhare
1826b29757
[RLlib] Curiosity (intrinsic motivation) Exploration module. ( #9912 )
2020-08-13 20:14:16 +02:00
Barak Michener
8e76796fd0
ci: Redo format.sh --all
script & backfill lint fixes ( #9956 )
2020-08-07 16:49:49 -07:00
Michael Luo
4d7bd8c892
[RLlib] Implementation of "Model-based Meta Policy Optimization" (MB MPO) ( #9409 )
2020-08-02 18:12:09 +02:00
Sven Mika
ff9c1dac88
[RLlib] Issue 9667 DDPG Torch bugs and enhancements. ( #9680 )
2020-07-28 14:15:03 +02:00
Sven Mika
78dfed2683
[RLlib] Issue 8384: QMIX doesn't learn anything. ( #9527 )
2020-07-17 12:14:34 +02:00
Sven Mika
935d8308fb
[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). ( #9497 )
2020-07-16 14:55:50 +02:00
Sven Mika
fcdf410ae1
[RLlib] Tf2.x native. ( #8752 )
2020-07-11 22:06:35 +02:00
Sven Mika
01125b8fcf
[RLlib] DQN rainbow eager-mode (keras style NoisyLayer) (preparation for native tf2.x support). ( #9304 )
2020-07-09 10:44:10 +02:00
Sven Mika
4da0e542d5
[RLlib] DDPG and SAC eager support (preparation for tf2.x) ( #9204 )
2020-07-08 16:12:20 +02:00
Sven Mika
5b2a97597b
[RLlib] Retire try_import_tree
(should be installed along with other requirements). ( #9211 )
...
- Retire try_import_tree.
- Stabilize test_supported_multi_agent.py.
2020-07-02 13:06:34 +02:00
Sven Mika
43043ee4d5
[RLlib] Tf2x preparation; part 2 (upgrading try_import_tf()
). ( #9136 )
...
* WIP.
* Fixes.
* LINT.
* WIP.
* WIP.
* Fixes.
* Fixes.
* Fixes.
* Fixes.
* WIP.
* Fixes.
* Test
* Fix.
* Fixes and LINT.
* Fixes and LINT.
* LINT.
2020-06-30 10:13:20 +02:00
Sven Mika
4fd8977eaf
[RLlib] Minor cleanup in preparation to tf2.x support. ( #9130 )
...
* WIP.
* Fixes.
* LINT.
* Fixes.
* Fixes and LINT.
* WIP.
2020-06-25 19:01:32 +02:00
Eric Liang
831b2fe51d
[rllib] Set framework to tf by default and remove import checks; "Auto" option ( #8748 )
...
* tf by default
* Update rllib/agents/trainer.py
Co-authored-by: Sven Mika <sven@anyscale.io>
* remove it
* fix
* remove
* fix
* lint
Co-authored-by: Sven Mika <sven@anyscale.io>
2020-06-08 23:04:50 -07:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch
in favor of framework=...
( #8520 )
2020-05-27 16:19:13 +02:00
Sven Mika
6d196197bc
[RLlib] utils/spaces ... ( #8608 )
2020-05-27 10:21:30 +02:00
Sven Mika
d7eaacb5fe
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. ( #8324 )
2020-05-08 08:26:32 +02:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 ( #8353 )
2020-05-08 08:20:18 +02:00
Sven Mika
6c2b9a4cfa
[RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). ( #8304 )
...
Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304 )
2020-05-04 23:53:38 +02:00
Sven Mika
a00144f746
[RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). ( #8302 )
2020-05-04 22:27:30 +02:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. ( #8288 )
...
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Sven Mika
166bb5d690
[RLlib] IMPALA PyTorch ( #8287 )
...
This PR adds an IMPALA PyTorch implementation.
- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Sven Mika
76e1a4df9e
Fix TD3 torch via GaussianNoise torch bug. ( #8276 )
2020-05-02 08:12:21 +02:00
Sven Mika
1775e89f26
[RLlib] Remove TupleActions and support arbitrarily nested action spaces. ( #8143 )
...
Deprecate TupleActions and support arbitrarily nested action spaces.
Closes issue #8143 .
2020-04-28 14:59:16 +02:00
Eric Liang
2298f6fb40
[rllib] Port DQN/Ape-X to training workflow api ( #8077 )
2020-04-23 12:39:19 -07:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. ( #7953 )
...
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.
* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) ( #7984 )
...
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
(from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.
* Fix `clip_action` import from Policy (should probably be moved into utils altogether).
* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).
* Add `config` to c'tor call to TFPolicy.
* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.
* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).
* Fix LINT errors in Policy classes.
* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.
* policy.py LINT errors.
* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).
* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.
* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).
* QMixPolicy add empty implementations of abstract Policy methods.
* Store Policy's config in self.config in base Policy c'tor.
* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).
* Cartpole tf learning.
* Cartpole tf AND torch learning (in ~ same ts).
* Cartpole tf AND torch learning (in ~ same ts). 2
* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3
* Cartpole tf AND torch learning (in ~ same ts). 4
* Cartpole tf AND torch learning (in ~ same ts). 5
* Cartpole tf AND torch learning (in ~ same ts). 6
* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.
* WIP.
* WIP.
* SAC torch learning Pendulum.
* WIP.
* SAC torch and tf learning Pendulum and Cartpole after cleanup.
* WIP.
* LINT.
* LINT.
* SAC: Move policy.target_model to policy.device as well.
* Fixes and cleanup.
* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).
* Fixes and LINT.
* Fixes and LINT.
* Fix and LINT.
* WIP.
* Test fixes and LINT.
* Fixes and LINT.
Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. ( #7934 )
2020-04-09 14:04:21 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Sven Mika
82c2d9faba
[RLlib] Fix broken RLlib tests in master. ( #7894 )
2020-04-05 09:34:23 -07:00
Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. ( #7852 )
...
* Add testing framework_iterator.
* LINT.
* WIP.
* Fix and LINT.
* LINT fix.
2020-04-03 12:24:25 -07:00
Sven Mika
5537fe13b0
[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. ( #7814 )
2020-04-03 10:44:25 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. ( #7798 )
...
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
e356e97eb2
[RLlib] Assert correct policy class being used in Worker. ( #7769 )
2020-03-30 14:03:29 -07:00
Sven Mika
e4bd5db4d8
[RLlib] Minimal ParamNoise PR. ( #7772 )
2020-03-28 16:16:30 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Sven Mika
80d314ae5e
[RLlib] Add all agents to rllib rollout
tests. ( #7534 )
2020-03-12 11:02:51 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. ( #7533 )
2020-03-11 20:39:47 -07:00