Artur Niederfahrenhorst
f7be409462
[RLlib] Training Iteration Function for SAC ( #24157 )
2022-04-26 12:37:54 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q ( #22842 )
2022-03-29 14:44:40 +02:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
b10d5533be
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. ( #21452 )
2022-01-10 11:19:40 +01:00
Sven Mika
b4790900f5
[RLlib] Sub-class Trainer
(instead of build_trainer()
): All remaining classes; soft-deprecate build_trainer
. ( #20725 )
2021-12-04 22:05:26 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
gjoliver
d81885c1f1
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests ( #19809 )
...
* Fix DDPG, since it is based on GenericOffPolicyTrainer.
* Fix QMix, SAC, and MADDPA too.
* Undo QMix change.
* Fix DQN input batch type. Always use SampleBatch.
* apex ddpg should not use replay_buffer_config yet.
* Make eager tf policy to use SampleBatch.
* lint
* LINT.
* Re-enable RLlib broken tests to make sure things work ok now.
* fixes.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-28 18:06:47 +02:00
Sven Mika
ba1c489b79
[RLlib Testing] Lower --smoke-test
"time_total_s" to make sure it doesn't time out. ( #18670 )
2021-09-16 18:22:23 +02:00
Sven Mika
8a00154038
[RLlib] Bump tf version in ML docker to tf==2.5.0; add tfp to ML-docker. ( #18544 )
2021-09-15 08:46:37 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. ( #17999 )
2021-08-31 12:21:49 +02:00
Sven Mika
90b21ce27e
[RLlib] De-flake 3 test cases; Fix config.simple_optimizer
and SampleBatch.is_training
warnings. ( #17321 )
2021-07-27 14:39:06 -04:00
Julius Frost
0b1b6222bc
[rllib] Add merge_trainer_config arguments to trainer template ( #17160 )
2021-07-21 15:43:06 -07:00
Sven Mika
5a313ba3d6
[RLlib] Refactor: All tf static graph code should reside inside Policy class. ( #17169 )
2021-07-20 14:58:13 -04:00
Sven Mika
1fd0eb805e
[RLlib] Redo fix bug normalize vs unsquash actions (original PR made log-likelihood test flakey). ( #17014 )
2021-07-13 14:01:30 -04:00
Amog Kamsetty
bc33dc7e96
Revert "[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action
, not normalize_action
." ( #17002 )
...
This reverts commit 7862dd64ea
.
2021-07-12 11:09:14 -07:00
Sven Mika
7862dd64ea
[RLlib] Fix bug in policy.py: normalize_actions=True has to call unsquash_action
, not normalize_action
. ( #16774 )
2021-07-08 17:31:34 +02:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes ( #16531 )
2021-06-30 12:32:11 +02:00
Sven Mika
cecfc3b43b
[RLlib] Multi-GPU support for Torch algorithms. ( #14709 )
2021-04-16 09:16:24 +02:00
Sven Mika
4f66309e19
[RLlib] Redo issue 14533 tf enable eager exec ( #14984 )
2021-03-29 20:07:44 +02:00
SangBin Cho
fa5f961d5e
Revert "[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )" ( #14918 )
...
This reverts commit 3e389d5812
.
2021-03-25 00:42:01 -07:00
Sven Mika
3e389d5812
[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )
2021-03-24 12:54:27 +01:00
Sven Mika
732197e23a
[RLlib] Multi-GPU for tf-DQN/PG/A2C. ( #13393 )
2021-03-08 15:41:27 +01:00
Sven Mika
52c94b7ee9
[RLlib] Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. ( #13522 )
2021-02-02 13:05:58 +01:00
Sven Mika
2e3655e8a9
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. ( #13238 )
2021-01-19 14:22:36 +01:00
Sven Mika
19c8033df2
[RLlib] Fix most remaining RLlib algos for running with trajectory view API. ( #12366 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT and fixes.
MB-MPO and MAML not working yet.
* wip
* update
* update
* rmeove
* remove dep
* higher
* Update requirements_rllib.txt
* Update requirements_rllib.txt
* relpos
* no mbmpo
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Sven Mika
b6b54f1c81
[RLlib] Trajectory view API: enable by default for SAC, DDPG, DQN, SimpleQ ( #11827 )
2020-11-16 10:54:35 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master ( #11897 )
...
* use master link
* remae
* revert non-ray
* more
* mre
2020-11-10 10:53:28 -08:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. ( #10825 )
2020-09-20 11:27:02 +02:00
Sven Mika
28ab797cf5
[RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). ( #10544 )
2020-09-06 10:58:00 +02:00
Sven Mika
4da0e542d5
[RLlib] DDPG and SAC eager support (preparation for tf2.x) ( #9204 )
2020-07-08 16:12:20 +02:00
Piotr Januszewski
155cc81e40
Clarify training intensity configuration docstring ( #9244 ) ( #9306 )
2020-07-05 20:07:27 -07:00
Sven Mika
4fd8977eaf
[RLlib] Minor cleanup in preparation to tf2.x support. ( #9130 )
...
* WIP.
* Fixes.
* LINT.
* Fixes.
* Fixes and LINT.
* WIP.
2020-06-25 19:01:32 +02:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch
in favor of framework=...
( #8520 )
2020-05-27 16:19:13 +02:00
Eric Liang
aa7a58e92f
[rllib] Support training intensity for dqn / apex ( #8396 )
2020-05-20 11:22:30 -07:00
Sven Mika
f7e4dae852
[RLlib] DQN and SAC Atari benchmark fixes. ( #7962 )
...
* Add Atari SAC-discrete (learning MsPacman in 40k ts up to 780 rewards).
* SAC loss function test case fix.
2020-04-17 08:49:15 +02:00
Sven Mika
428516056a
[RLlib] SAC Torch (incl. Atari learning) ( #7984 )
...
* Policy-classes cleanup and torch/tf unification.
- Make Policy abstract.
- Add `action_dist` to call to `extra_action_out_fn` (necessary for PPO torch).
- Move some methods and vars to base Policy
(from TFPolicy): num_state_tensors, ACTION_PROB, ACTION_LOGP and some more.
* Fix `clip_action` import from Policy (should probably be moved into utils altogether).
* - Move `is_recurrent()` and `num_state_tensors()` into TFPolicy (from DynamicTFPolicy).
- Add config to all Policy c'tor calls (as 3rd arg after obs and action spaces).
* Add `config` to c'tor call to TFPolicy.
* Add missing `config` to c'tor call to TFPolicy in marvil_policy.py.
* Fix test_rollout_worker.py::MockPolicy and BadPolicy classes (Policy base class is now abstract).
* Fix LINT errors in Policy classes.
* Implement StatefulPolicy abstract methods in test cases: test_multi_agent_env.py.
* policy.py LINT errors.
* Create a simple TestPolicy to sub-class from when testing Policies (reduces code in some test cases).
* policy.py
- Remove abstractmethod from `apply_gradients` and `compute_gradients` (these are not required iff `learn_on_batch` implemented).
- Fix docstring of `num_state_tensors`.
* Make QMIX torch Policy a child of TorchPolicy (instead of Policy).
* QMixPolicy add empty implementations of abstract Policy methods.
* Store Policy's config in self.config in base Policy c'tor.
* - Make only compute_actions in base Policy's an abstractmethod and provide pass
implementation to all other methods if not defined.
- Fix state_batches=None (most Policies don't have internal states).
* Cartpole tf learning.
* Cartpole tf AND torch learning (in ~ same ts).
* Cartpole tf AND torch learning (in ~ same ts). 2
* Cartpole tf (torch syntax-broken) learning (in ~ same ts). 3
* Cartpole tf AND torch learning (in ~ same ts). 4
* Cartpole tf AND torch learning (in ~ same ts). 5
* Cartpole tf AND torch learning (in ~ same ts). 6
* Cartpole tf AND torch learning (in ~ same ts). Pendulum tf learning.
* WIP.
* WIP.
* SAC torch learning Pendulum.
* WIP.
* SAC torch and tf learning Pendulum and Cartpole after cleanup.
* WIP.
* LINT.
* LINT.
* SAC: Move policy.target_model to policy.device as well.
* Fixes and cleanup.
* Fix data-format of tf keras Conv2d layers (broken for some tf-versions which have data_format="channels_first" as default).
* Fixes and LINT.
* Fixes and LINT.
* Fix and LINT.
* WIP.
* Test fixes and LINT.
* Fixes and LINT.
Co-authored-by: Sven Mika <sven@Svens-MacBook-Pro.local>
2020-04-15 13:25:16 +02:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
52cf77f5a9
[rllib] SAC no_done_at_end should default to False ( #7594 )
...
* update
* update doc
* stochastic
* cleanu
2020-03-14 11:16:54 -07:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. ( #7320 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* update.
* WIP.
* Gumbel Softmax Dist.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP
* WIP.
* WIP.
* Hypertune.
* Hypertune.
* Hypertune.
* Lock-in.
* Cleanup.
* LINT.
* Fix.
* Update rllib/policy/eager_tf_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Fix items from review comments.
* Add dm_tree to RLlib dependencies.
* Add dm_tree to RLlib dependencies.
* Fix DQN test cases ((Torch)Categorical).
* Fix wrong pip install.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
fddeb6809c
[RLlib] Issue 7401: In eval mode (if evaluation_episodes > 0), agent hangs if Env does not terminate. ( #7448 )
...
* Fix.
* Rollback.
* Fix issue 7421.
* Fix.
2020-03-04 12:58:34 -08:00
Sven Mika
e1fc8368d4
[RLlib] SAC refactor with new SquashedGaussian distribution class. ( #7272 )
2020-02-23 16:10:20 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). ( #7155 )
2020-02-19 12:18:45 -08:00
Sven Mika
6e1c3ea824
[RLlib] Exploration API (+EpsilonGreedy sub-class). ( #6974 )
2020-02-10 15:22:07 -08:00
Sven Mika
2ccf08ad10
[RLlib] Bug fix: DQN goes into negative epsilon values after reaching explora… ( #6971 )
...
* Bug fix: DQN goes into negative epsilon values after reaching exploration percentage.
* Add `epsilon_initial_eps` to SAC to pass test_nested_spaces.py.
* Add `exploration_initial_eps` to QMIX default config.
2020-01-31 09:54:12 -08:00
Sven
60d4d5e1aa
Remove future imports ( #6724 )
...
* Remove all __future__ imports from RLlib.
* Remove (object) again from tf_run_builder.py::TFRunBuilder.
* Fix 2xLINT warnings.
* Fix broken appo_policy import (must be appo_tf_policy)
* Remove future imports from all other ray files (not just RLlib).
* Remove future imports from all other ray files (not just RLlib).
* Remove future import blocks that contain `unicode_literals` as well.
Revert appo_tf_policy.py to appo_policy.py (belongs to another PR).
* Add two empty lines before Schedule class.
* Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.
2020-01-09 00:15:48 -08:00
Michael Luo
548df014ec
SAC Performance Fixes ( #6295 )
...
* SAC Performance Fixes
* Small Changes
* Update sac_model.py
* fix normalize wrapper
* Update test_eager_support.py
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2019-12-20 10:51:25 -08:00