Sven Mika
bb6c675231
[RLlib] Bug fix: Copy is_exploring
placeholder for multi-GPU tower generation. ( #7846 )
2020-04-03 10:44:58 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. ( #7798 )
...
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
66df8b8c35
[RLlib] Working/learning example: PPO + torch + LSTM. ( #7797 )
2020-03-31 22:00:28 -07:00
Sven Mika
e4bd5db4d8
[RLlib] Minimal ParamNoise PR. ( #7772 )
2020-03-28 16:16:30 -07:00
Saurabh Gupta
6ddf84b019
Contextual Bandit algorithms (WIP) ( #7642 )
2020-03-26 13:41:16 -07:00
Sven Mika
1138f2ebed
[RLlib] Issue 7046 cannot restore keras model from h5 file. ( #7482 )
2020-03-23 12:19:30 -07:00
Eric Liang
7ebc6783e4
[rllib] Add back get_policy_output method for SAC model ( #7604 )
2020-03-20 12:44:04 -07:00
Eric Liang
9392cdbf74
[rllib] Add high-performance external application connector ( #7641 )
2020-03-20 12:43:57 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
c3a8ba399f
[rllib] Enable distributed exec api for A2C, A3C, PG by default ( #7580 )
2020-03-13 18:48:41 -07:00
Sven Mika
552cfb37ea
[RLlib] Fix bugs and speed up SegmentTree
2020-03-13 01:03:07 -07:00
Sven Mika
80d314ae5e
[RLlib] Add all agents to rllib rollout
tests. ( #7534 )
2020-03-12 11:02:51 -07:00
Eric Liang
f5d12a958b
[rllib] Port Ape-X to distributed execution API ( #7497 )
2020-03-12 00:54:08 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. ( #7533 )
2020-03-11 20:39:47 -07:00
Eric Liang
be48e1964b
[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety ( #7504 )
...
* fix sched
* lintc
* lint
* fix
* add unit test
* fix
* format
* fix test
* fix test
2020-03-10 11:14:14 -07:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. ( #7320 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* update.
* WIP.
* Gumbel Softmax Dist.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP
* WIP.
* WIP.
* Hypertune.
* Hypertune.
* Hypertune.
* Lock-in.
* Cleanup.
* LINT.
* Fix.
* Update rllib/policy/eager_tf_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Fix items from review comments.
* Add dm_tree to RLlib dependencies.
* Add dm_tree to RLlib dependencies.
* Fix DQN test cases ((Torch)Categorical).
* Fix wrong pip install.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Sven Mika
83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. ( #7314 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix
* WIP.
* Add TD3 quick Pendulum regresison.
* Cleanup.
* Fix.
* LINT.
* Fix.
* Sort quick_learning test cases, add TD3.
* Sort quick_learning test cases, add TD3.
* Revert test_checkpoint_restore.py (debugging) changes.
* Fix old soft_q settings in documentation and test configs.
* More doc fixes.
* Fix test case.
* Fix test case.
* Lower test load.
* WIP.
2020-03-01 11:53:35 -08:00
Eric Liang
3c6b94f3f5
[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C ( #7299 )
2020-02-28 16:44:17 -08:00
Eric Liang
58073f7260
[rllib] Fix multiagent example crash due to undefined abstract method ( #7329 )
...
* fix multiagent example
* 0 workers
2020-02-26 22:54:40 -08:00
Sven Mika
e1fc8368d4
[RLlib] SAC refactor with new SquashedGaussian distribution class. ( #7272 )
2020-02-23 16:10:20 -08:00
Sven Mika
0db2046b0a
[RLlib] Policy.compute_log_likelihoods() and SAC refactor. (issue #7107 ) ( #7124 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* WIP.
* Fix SAC.
* Fix SAC.
* Fix strange tf-error in ray core tests.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix test_io.py.
* LINT.
* Update SAC yaml files' config.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-02-22 14:19:49 -08:00
Sven Mika
e2edca45d4
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. ( #7238 )
...
* Take out stats to analyze memory leak in torch PPO.
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* Fix determine_tests_to_run.py.
* minor change to re-test after determine_tests_to_run.py.
* LINT.
* update comments.
* WIP
* WIP
* WIP
* FIX.
* Fix sequence_mask being dependent on torch being installed.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
2020-02-22 11:02:31 -08:00
Eric Liang
46af992efd
[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) ( #7213 )
2020-02-19 16:07:37 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). ( #7155 )
2020-02-19 12:18:45 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). ( #7178 )
...
* commit
* comment
2020-02-15 14:50:44 -08:00
Adrian O'Grady
fe6ce714a0
[rllib] - TaskPool.completed_prefetch() no longer returns stale object ids after an error ( #7139 )
2020-02-13 22:30:44 -08:00
Sven Mika
4c97348cb6
[RLlib] Schedule-classes multi-framework support. ( #6926 )
2020-01-28 11:07:55 -08:00
Sven Mika
ae9a3a2237
[RLlib] from_config util method for framework agnostic components; start moving RLlib tests into Bazel. ( #6865 )
2020-01-22 17:02:58 -08:00