Sven Mika
f165766813
[RLlib] Bug: If trainer config horizon
is provided, should try to increase env steps to that value. ( #7531 )
2020-03-12 11:03:37 -07:00
Sven Mika
80d314ae5e
[RLlib] Add all agents to rllib rollout
tests. ( #7534 )
2020-03-12 11:02:51 -07:00
Eric Liang
f5d12a958b
[rllib] Port Ape-X to distributed execution API ( #7497 )
2020-03-12 00:54:08 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. ( #7533 )
2020-03-11 20:39:47 -07:00
Sven Mika
dded5b6d22
[RLlib] ES env_config
is not a EnvContext object (e.g. does not contain worker_index
). ( #7560 )
2020-03-11 20:33:20 -07:00
Sven Mika
bc120730e5
[RLlib] PPO(torch) on CartPole not tuned well enough for consistent learning ( #7556 )
2020-03-11 20:31:27 -07:00
Eric Liang
be48e1964b
[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety ( #7504 )
...
* fix sched
* lintc
* lint
* fix
* add unit test
* fix
* format
* fix test
* fix test
2020-03-10 11:14:14 -07:00
Sven Mika
f08687f550
[RLlib] rllib train
crashes when using torch PPO/PG/A2C. ( #7508 )
...
* Fix.
* Rollback.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
2020-03-08 13:03:18 -07:00
Eric Liang
a644060daa
[rllib] First pass at pipeline implementation of DQN ( #7433 )
...
* wip iters
* add test
* speed up
* update docs
* document it
* support serial sampling
* add test
* spacing
* annotate it
* update
* rename to pipeline
* comment
* iter2 wip
* update
* update
* context test
* update
* fix
* fix
* a3c pipeline
* doc
* update
* move timer
* comment
* add piepline test
* fix
* clean up
* document
* iter s
* wip dqn
* wip
* wip
* metrics
* metrics rename
* metrics ctx
* wip
* constants
* add todo
* suppport .union
* wip
* support union
* remove prints
* add todo
* remove auto timer
* fix up
* fix pipeline test
* typing
* fix breakage
* remove bad assert
* wip
* fix multiagent example
* fixapply
* update a3c
* remove a2c pl
* 0 workers
* wip
* wip
* share metrics
* wip
* wip
* doc
* fix weight sync and global var updates
* mode
* fix
* fix
* doc
* fix
2020-03-07 14:47:58 -08:00
Sven Mika
876a1ba5bd
[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. ( #7445 )
2020-03-06 12:45:30 -08:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. ( #7320 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* update.
* WIP.
* Gumbel Softmax Dist.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP
* WIP.
* WIP.
* Hypertune.
* Hypertune.
* Hypertune.
* Lock-in.
* Cleanup.
* LINT.
* Fix.
* Update rllib/policy/eager_tf_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Fix items from review comments.
* Add dm_tree to RLlib dependencies.
* Add dm_tree to RLlib dependencies.
* Fix DQN test cases ((Torch)Categorical).
* Fix wrong pip install.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
1989eed3bf
[RLlib] Issue 7136: rollout not working for ES and ARS. ( #7444 )
...
* Fix.
* Fix issue #7136 .
* ARS fix.
2020-03-04 23:57:44 -08:00
Eric Liang
596b39e36a
[rllib] Make timestep a required arg for exploration classes ( #7380 )
2020-03-04 13:00:37 -08:00
Eric Liang
fddeb6809c
[RLlib] Issue 7401: In eval mode (if evaluation_episodes > 0), agent hangs if Env does not terminate. ( #7448 )
...
* Fix.
* Rollback.
* Fix issue 7421.
* Fix.
2020-03-04 12:58:34 -08:00
Eric Liang
c38224d8e5
[RLlib] Issue 7438 evaluation not working in pytorch. ( #7443 )
2020-03-04 12:53:04 -08:00
Eric Liang
aa4861c2a0
Checkpoint Adam momenta for DDPG ( #7449 )
2020-03-04 10:03:41 -08:00
Sven Mika
4198db5038
Torch multicat support (7419)
2020-03-04 00:41:40 -08:00
Sven Mika
7faf0d8f89
[RLlib] Make rollout always use evaluation_config
. ( #7396 )
2020-03-03 17:20:35 -08:00
Eric Liang
0f88444686
[rllib] Support multi-agent training in pipeline impls, add easy flag to enable ( #7338 )
2020-03-02 15:16:37 -08:00
Sven Mika
d8eeb96413
Fix issue with torch PPO not handling action spaces of shape=(>1,). ( #7398 )
2020-03-02 10:53:19 -08:00
Sven Mika
2d97650b1e
[RLlib] Add Exploration API documentation. ( #7373 )
...
* Add Exploration API documentation.
* Add Exploration API documentation.
* Add Exploration API documentation.
* Update exporation docs.
2020-03-01 16:55:41 -08:00
Sven Mika
83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. ( #7314 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix
* WIP.
* Add TD3 quick Pendulum regresison.
* Cleanup.
* Fix.
* LINT.
* Fix.
* Sort quick_learning test cases, add TD3.
* Sort quick_learning test cases, add TD3.
* Revert test_checkpoint_restore.py (debugging) changes.
* Fix old soft_q settings in documentation and test configs.
* More doc fixes.
* Fix test case.
* Fix test case.
* Lower test load.
* WIP.
2020-03-01 11:53:35 -08:00
Eric Liang
3c6b94f3f5
[rllib] Enable performance metrics reporting for RLlib pipelines, add A3C ( #7299 )
2020-02-28 16:44:17 -08:00
Sven Mika
0c9e5db9cb
Fix SAC bug (twin Q not used for min'ing over both Q-nets in loss func). ( #7354 )
2020-02-27 12:49:08 -08:00
Sven Mika
357232d124
[Core/RLlib] Move log_once
from rllib to ray.util. ( #7273 )
...
* Move log_once from rllib to tune.
* Move log_once from rllib to tune.
* LINT.
* Move to ray.util.debug.
2020-02-27 10:40:44 -08:00
Sven Mika
44ac0ead34
[RLlib] rollout.py; make video-recording options more intuitive and add warnings/errors (issue 7121). ( #7347 )
2020-02-27 10:39:02 -08:00
Eric Liang
58073f7260
[rllib] Fix multiagent example crash due to undefined abstract method ( #7329 )
...
* fix multiagent example
* 0 workers
2020-02-26 22:54:40 -08:00
Sven Mika
aec03656d5
[RLlib] TupleActions cannot be exported by Policy: Fixes issues 7231 and 5593. #7333
2020-02-26 15:22:54 -08:00
Matthew Brulhardt
75f683eec6
[rllib] Fix error in shape calculation. ( #7301 )
2020-02-25 14:16:29 -08:00
Sven Mika
e1fc8368d4
[RLlib] SAC refactor with new SquashedGaussian distribution class. ( #7272 )
2020-02-23 16:10:20 -08:00
Eric Liang
1660b52751
[rllib] Fix torch GPU / yaml load warning ( #7278 )
...
* fix
* safe load
* reduce num buffer shardscZZ
2020-02-23 13:13:43 -08:00
Sven Mika
0db2046b0a
[RLlib] Policy.compute_log_likelihoods() and SAC refactor. (issue #7107 ) ( #7124 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* WIP.
* Fix SAC.
* Fix SAC.
* Fix strange tf-error in ray core tests.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix test_io.py.
* LINT.
* Update SAC yaml files' config.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-02-22 14:19:49 -08:00
Sven Mika
e2edca45d4
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. ( #7238 )
...
* Take out stats to analyze memory leak in torch PPO.
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* Fix determine_tests_to_run.py.
* minor change to re-test after determine_tests_to_run.py.
* LINT.
* update comments.
* WIP
* WIP
* WIP
* FIX.
* Fix sequence_mask being dependent on torch being installed.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
2020-02-22 11:02:31 -08:00
Sven Mika
cbc808bc6b
[Tests] determine_tests_to_run.sh has a bug affecting RLlib testing to be skipped sometimes. ( #7243 )
2020-02-20 19:02:17 -08:00
Sven Mika
6043ce710d
Fix old exploration configs. ( #7240 )
2020-02-20 08:39:16 -08:00
Simon Mo
b804d40c04
Stop vendoring pyarrow ( #7233 )
2020-02-19 19:01:26 -08:00
Eric Liang
46af992efd
[rllib] [experimental] custom RL training pipelines (PG_pl, A2C_pl) ( #7213 )
2020-02-19 16:07:37 -08:00
Simon Mo
7bef7031c2
Revert "Revert "Revert "Removing Pyarrow dependency ( #7146 )" ( #7209 ) ( #7214 )" ( #7232 )
2020-02-19 13:35:29 -08:00
Sven Mika
d537e9f0d8
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). ( #7155 )
2020-02-19 12:18:45 -08:00
Eric Liang
399424c418
[rllib] Fix broken check in eval mode for IMPALA #7217
2020-02-19 11:54:30 -08:00
Simon Mo
e8941b1b79
Revert "Revert "Removing Pyarrow dependency ( #7146 )" ( #7209 ) ( #7214 )
2020-02-19 10:08:52 -08:00
Eric Liang
0aa9373d62
Revert "Removing Pyarrow dependency ( #7146 )" ( #7209 )
...
This reverts commit 2116fd3bca
.
2020-02-18 14:12:06 -08:00
Eric Liang
5df801605e
Add ray.util package and move libraries from experimental ( #7100 )
2020-02-18 13:43:19 -08:00
ijrsvt
2116fd3bca
Removing Pyarrow dependency ( #7146 )
2020-02-17 18:00:13 -08:00
mehrdadn
3bd82d0bcd
Fix various issues/warnings that come up on Jenkins ( #7147 )
...
* Avoid warning about swap being unlimited
Currently we get the following message on Jenkins:
"Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
Since we're not limiting swap anyway, we might as well avoid trying to.
https://docs.docker.com/config/containers/resource_constraints/#--memory-swap-details
* Fix escaping in re.search()
* Fix escaping in _noisy_layer()
* Raise a more descriptive error when dashboard data isn't found
* Don't error on dashboard files not being found when webui isn't required
* Change dashboard error to a warning instead
2020-02-17 16:08:55 -08:00
Eric Liang
42aea966ff
[rllib] Convert torch state arrays to tensors during compute actions ( #7162 )
...
* convert to tensor
* normalize fix
2020-02-17 10:26:58 -08:00
Eric Liang
b6233dff3c
[rllib] Fix bad sample count assert
2020-02-15 17:22:23 -08:00
Sven Mika
2e60f0d4d8
[RLlib] Move all jenkins RLlib-tests into bazel (rllib/BUILD). ( #7178 )
...
* commit
* comment
2020-02-15 14:50:44 -08:00
Adrian O'Grady
fe6ce714a0
[rllib] - TaskPool.completed_prefetch() no longer returns stale object ids after an error ( #7139 )
2020-02-13 22:30:44 -08:00
Sven Mika
5518a738b3
[RLlib] Fix erroneous use of LinearSchedule (in DDPG's exploration annealing). ( #7125 )
...
* Fix erroneous use of LinearSchedule (in DDPG's exploration annealing).
Erase schedules_obsoleted.py.
* Trigger re-test.
* Re-test.
2020-02-12 23:46:49 -08:00