Commit graph

81 commits

Author SHA1 Message Date
Sven Mika
14405b90d5
[RLlib] Prototype of a DynaTrainer (for env dynamics learning in upcoming MBMPO algo). (#8860) 2020-06-16 09:01:20 +02:00
Sven Mika
7008902cff
[RLlib] Minor rllib.utils cleanup. (#8932) 2020-06-16 08:52:20 +02:00
Sven Mika
4ed796a7d6
[RLlib] Add testing Policy.compute_single_action() for all agents. (#8903) 2020-06-13 17:51:50 +02:00
Eric Liang
34bae27ac7
[rllib] Flexible multi-agent replay modes and replay_sequence_length (#8893) 2020-06-12 20:17:27 -07:00
Sven Mika
25c0974543
[RLlib] Issue 8412 (Adam vars not stored in ModelV2). (#8480) 2020-06-05 21:07:02 +02:00
Sven Mika
c74dc58f8b
[RLlib] Fix use_lstm flag for ModelV2 (w/o ModelV1 wrapping) and add it for PyTorch. (#8734) 2020-06-05 15:40:30 +02:00
Sven Mika
368088be85
[RLlib] Sample batch docs and cleanup. (#8778) 2020-06-04 22:47:32 +02:00
Victor Le
aee01133cd
Fix dict/tuple hybrid action space for tensorflow eager execution (#8781) 2020-06-04 13:28:46 -07:00
Sven Mika
d8a081a185
[RLlib] Unity3D integration (n Unity3D clients vs learning server). (#8590) 2020-05-30 22:48:34 +02:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Sven Mika
6d196197bc
[RLlib] utils/spaces ... (#8608) 2020-05-27 10:21:30 +02:00
Sven Mika
0422e9c5a8
[RLlib] Add 2 Transformer learning test cases on StatelessCartPole (PPO and IMPALA). (#8624) 2020-05-27 10:19:47 +02:00
Sven Mika
d76578700d
[RLlib] Policy.compute_single_action() broken for nested actions (Issue 8411). (#8514) 2020-05-20 22:29:08 +02:00
Sven Mika
796a834c48
[RLlib] Attention Net integration into ModelV2 and learning RL example. (#8371) 2020-05-18 17:26:40 +02:00
Sven Mika
57544b1ff9
[RLlib] Examples folder restructuring (Model examples; final part). (#8278)
- This PR completes any previously missing PyTorch Model counterparts to TFModels in examples/models.
- It also makes sure, all example scripts in the rllib/examples folder are tested for both frameworks and learn the given task (this is often currently not checked) using a --as-test flag in connection with a --stop-reward.
2020-05-12 08:23:10 +02:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00
Sven Mika
5f278c6411
[RLlib] Examples folder restructuring (models) part 1 (#8353) 2020-05-08 08:20:18 +02:00
Eric Liang
f48da50e1c
[rllib] observation function api for multi-agent (#8236) 2020-05-04 22:13:49 -07:00
Sven Mika
6c2b9a4cfa
[RLlib] Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
Remove tf.py_function from all Schedule classes (not differentiable and causes other bugs in MA setups). (#8304)
2020-05-04 23:53:38 +02:00
Sven Mika
a00144f746
[RLlib] Fix issue 8135 (DDPG inf actions when using [-inf,inf] action space). (#8302) 2020-05-04 22:27:30 +02:00
Sven Mika
1775e89f26
[RLlib] Remove TupleActions and support arbitrarily nested action spaces. (#8143)
Deprecate TupleActions and support arbitrarily nested action spaces.
Closes issue #8143.
2020-04-28 14:59:16 +02:00
Sven Mika
4e713152e9
[RLlib] Fix for issue https://github.com/ray-project/ray/issues/8191 (#8200)
Fix attribute error when missing exploration in Policy.
Issue #8191
2020-04-27 23:19:26 +02:00
Tomasz Wrona
b508166419
Copy initial state of an RNN to a CPU before converting it to a NumPy array (#8097) 2020-04-25 18:49:09 -07:00
Sven Mika
499ad5fbe4
[RLlib] PyTorch version of APPO. (#8120)
- Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases.
- Add learning test cases for APPO torch (both w/ and w/o v-trace).
- Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).
2020-04-23 09:11:12 +02:00
Sven Mika
e9ee5c4e5f
[RLlib] Nested action space PR (minimally invasive; torch only + test). (#8101)
- Add TorchMultiActionDistribution class.
- Add framework-agnostic test cases for TorchMultiActionDistribution.
2020-04-23 09:09:22 +02:00
Sven Mika
3812bfedda
[RLlib] PyTorch version of ES (Evolution Strategies). (#8104)
PyTorch version of Evolution Strategies (ES) Algo.
2020-04-20 21:47:28 +02:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) (#7984)
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
  (from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.

* Fix `clip_action` import from Policy (should probably be moved into utils altogether).

* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).

* Add `config` to c'tor call to TFPolicy.

* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.

* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).

* Fix LINT errors in Policy classes.

* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.

* policy.py LINT errors.

* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).

* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.

* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).

* QMixPolicy add empty implementations of abstract Policy methods.

* Store Policy's config in self.config in base Policy c'tor.

* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).

* Cartpole tf learning.

* Cartpole tf AND torch learning (in ~ same ts).

* Cartpole tf AND torch learning (in ~ same ts). 2

* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3

* Cartpole tf AND torch learning (in ~ same ts). 4

* Cartpole tf AND torch learning (in ~ same ts). 5

* Cartpole tf AND torch learning (in ~ same ts). 6

* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.

* WIP.

* WIP.

* SAC torch learning Pendulum.

* WIP.

* SAC torch and tf learning Pendulum and Cartpole after cleanup.

* WIP.

* LINT.

* LINT.

* SAC: Move policy.target_model to policy.device as well.

* Fixes and cleanup.

* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).

* Fixes and LINT.

* Fixes and LINT.

* Fix and LINT.

* WIP.

* Test fixes and LINT.

* Fixes and LINT.

Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
Sven Mika
0a5b6d1f57
[Testing] Do not run any non-RLlib/core tests if only RLLib affected (except wheels). (#7892)
* Do not run any non-RLlib/core tests if only RLLib affected, except for generating the 2 wheels (OSX and Linux).

* Test noop RLlib change.

* Test noop RLlib change.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Test.

* WIP.

* Add env flag RAY_CI_ONLY_RLLIB_AFFECTED to refrain from testing most ray-core stuff (except wheels) if only RLlib changed.

* Test RLlib-only change.
2020-04-09 14:36:06 -07:00
Sven Mika
c2cb5c2214
[RLlib] MARWIL torch. (#7836)
* WIP.

* WIP.

* LINT.

* Fix MARWIL so it can run with eager-mode.

* LINT.
2020-04-06 16:38:50 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. (#7597)
* Fix.

* Rollback.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix.

* Fix.

* Fix.

* WIP.

* WIP.

* Fix.

* Test case fixes.

* Test case fixes and LINT.

* Test case fixes and LINT.

* Rollback.

* WIP.

* WIP.

* Test case fixes.

* Fix.

* Fix.

* Fix.

* Add regression test for DQN w/ param noise.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Comment

* Regression test case.

* WIP.

* WIP.

* LINT.

* LINT.

* WIP.

* Fix.

* Fix.

* Fix.

* LINT.

* Fix (SAC does currently not support eager).

* Fix.

* WIP.

* LINT.

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* WIP.

* Fix.

* LINT.

* LINT.

* Fix and LINT.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Fix.

* Fix and LINT.

* Update rllib/utils/exploration/exploration.py

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Fixes.

* WIP.

* LINT.

* Fixes and LINT.

* LINT and fixes.

* LINT.

* Move action_dist back into torch extra_action_out_fn and LINT.

* Working SimpleQ learning cartpole on both torch AND tf.

* Working Rainbow learning cartpole on tf.

* Working Rainbow learning cartpole on tf.

* WIP.

* LINT.

* LINT.

* Update docs and add torch to APEX test.

* LINT.

* Fix.

* LINT.

* Fix.

* Fix.

* Fix and docstrings.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Tune param-noise tests.

* LINT.

* Fix.

* Fix.

* test

* test

* test

* Fix.

* Fix.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT.

* WIP.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. (#7852)
* Add testing framework_iterator.

* LINT.

* WIP.

* Fix and LINT.

* LINT fix.
2020-04-03 12:24:25 -07:00
Sven Mika
bb6c675231
[RLlib] Bug fix: Copy is_exploring placeholder for multi-GPU tower generation. (#7846) 2020-04-03 10:44:58 -07:00
Sven Mika
5537fe13b0
[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. (#7814) 2020-04-03 10:44:25 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. (#7798)
* Rollback.

* WIP.

* WIP.

* LINT.

* WIP.

* Fix.

* Fix.

* Fix.

* LINT.

* Fix (SAC does currently not support eager).

* Fix.

* WIP.

* LINT.

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* WIP.

* Fix.

* LINT.

* LINT.

* Fix and LINT.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Fix.

* Fix and LINT.

* Update rllib/utils/exploration/exploration.py

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Fixes.

* LINT.

* WIP.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
66df8b8c35
[RLlib] Working/learning example: PPO + torch + LSTM. (#7797) 2020-03-31 22:00:28 -07:00
Sven Mika
e4bd5db4d8
[RLlib] Minimal ParamNoise PR. (#7772) 2020-03-28 16:16:30 -07:00
Sven Mika
369a3417c4
[RLlib] Add tf-graph by default when doing Policy.export_model(). (#7759)
* Rollback.

* WIP.

* WIP.

* Fix.

* LINT.
2020-03-26 22:07:10 -07:00
Sven Mika
1138f2ebed
[RLlib] Issue 7046 cannot restore keras model from h5 file. (#7482) 2020-03-23 12:19:30 -07:00
Robert Nishihara
ee8c9ff732
Remove six and cloudpickle from setup.py. (#7694) 2020-03-23 11:42:05 -07:00
Sven Mika
2fb219a658
[Ray RLlib] Fix tree import (#7662)
* Rollback.

* Fix import tree error by adding meaningful error and replacing by tf.nest wherever possible.

* LINT.

* LINT.

* Fix.

* Fix log-likelihood test case failing on travis.
2020-03-22 13:51:24 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-11 20:39:47 -07:00
Eric Liang
be48e1964b
[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety (#7504)
* fix sched

* lintc

* lint

* fix

* add unit test

* fix

* format

* fix test

* fix test
2020-03-10 11:14:14 -07:00
Sven Mika
f08687f550
[RLlib] rllib train crashes when using torch PPO/PG/A2C. (#7508)
* Fix.

* Rollback.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.

* TEST.
2020-03-08 13:03:18 -07:00
Sven Mika
876a1ba5bd
[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. (#7445) 2020-03-06 12:45:30 -08:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. (#7320)
* Exploration API (+EpsilonGreedy sub-class).

* Exploration API (+EpsilonGreedy sub-class).

* Cleanup/LINT.

* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).

* Add `error` option to deprecation_warning().

* WIP.

* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.

* WIP.

* LINT.

* WIP.

* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.

* Fix bug in sampler.py in case Policy has self.exploration = None

* Update rllib/agents/dqn/dqn.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Update rllib/agents/trainer.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* Change requests.

* LINT

* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set

* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).

* Update rllib/evaluation/worker_set.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Review fixes.

* Fix default value for DQN's exploration spec.

* LINT

* Fix recursion bug (wrong parent c'tor).

* Do not pass timestep to get_exploration_info.

* Update tf_policy.py

* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.

* Bug fix tf-action-dist

* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).

* Switch off exploration when getting action probs from off-policy-estimator's policy.

* LINT

* Fix test_checkpoint_restore.py.

* Deprecate all SAC exploration (unused) configs.

* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.

* WIP.

* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).

* WIP.

* Trigger re-test (flaky checkpoint-restore test).

* WIP.

* WIP.

* Add test case for deterministic action sampling in PPO.

* bug fix.

* Added deterministic test cases for different Agents.

* Fix problem with TupleActions in dynamic-tf-policy.

* Separate supported_spaces tests so they can be run separately for easier debugging.

* LINT.

* Fix autoregressive_action_dist.py test case.

* Re-test.

* Fix.

* Remove duplicate py_test rule from bazel.

* LINT.

* WIP.

* WIP.

* SAC fix.

* SAC fix.

* WIP.

* WIP.

* WIP.

* FIX 2 examples tests.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Renamed test file.

* WIP.

* Add unittest.main.

* Make action_dist_class mandatory.

* fix

* FIX.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix explorations test case (contextlib cannot find its own nullcontext??).

* Force torch to be installed for QMIX.

* LINT.

* Fix determine_tests_to_run.py.

* Fix determine_tests_to_run.py.

* WIP

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).

* Rename some stuff.

* Rename some stuff.

* WIP.

* update.

* WIP.

* Gumbel Softmax Dist.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP

* WIP.

* WIP.

* Hypertune.

* Hypertune.

* Hypertune.

* Lock-in.

* Cleanup.

* LINT.

* Fix.

* Update rllib/policy/eager_tf_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/agents/sac/sac_policy.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Update rllib/models/tf/tf_action_dist.py

Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>

* Fix items from review comments.

* Add dm_tree to RLlib dependencies.

* Add dm_tree to RLlib dependencies.

* Fix DQN test cases ((Torch)Categorical).

* Fix wrong pip install.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
596b39e36a
[rllib] Make timestep a required arg for exploration classes (#7380) 2020-03-04 13:00:37 -08:00
Sven Mika
83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
2020-03-01 11:53:35 -08:00
Sven Mika
357232d124
[Core/RLlib] Move log_once from rllib to ray.util. (#7273)
* Move log_once from rllib to tune.

* Move log_once from rllib to tune.

* LINT.

* Move to ray.util.debug.
2020-02-27 10:40:44 -08:00
Eric Liang
58073f7260
[rllib] Fix multiagent example crash due to undefined abstract method (#7329)
* fix multiagent example

* 0 workers
2020-02-26 22:54:40 -08:00
Sven Mika
aec03656d5
[RLlib] TupleActions cannot be exported by Policy: Fixes issues 7231 and 5593. #7333 2020-02-26 15:22:54 -08:00