Eric Liang
31b40b00f6
[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests ( #7958 )
2020-04-10 00:56:08 -07:00
Sven Mika
0a5b6d1f57
[Testing] Do not run any non-RLlib/core tests if only RLLib affected (except wheels). ( #7892 )
...
* Do not run any non-RLlib/core tests if only RLLib affected, except for generating the 2 wheels (OSX and Linux).
* Test noop RLlib change.
* Test noop RLlib change.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Test.
* WIP.
* Add env flag RAY_CI_ONLY_RLLIB_AFFECTED to refrain from testing most ray-core stuff (except wheels) if only RLlib changed.
* Test RLlib-only change.
2020-04-09 14:36:06 -07:00
Sven Mika
1b31c11806
[RLlib] DDPG re-factor to fit into RLlib's functional algorithm builder API. ( #7934 )
2020-04-09 14:04:21 -07:00
Sven Mika
d2b5c171cb
[RLlib] Add pytorch sigils to toc and add links to algo overview table. ( #7950 )
...
* Add torch sigils to toc-tree for DQN/APEX.
* WIP.
2020-04-09 10:40:18 -07:00
Eric Liang
e8c19aba41
[rllib] Add test case that we don't have a hard torch dependency ( #7926 )
2020-04-07 18:07:39 -07:00
Sven Mika
81314143eb
[RLlib] Use framework_iterator (add torch/eager/tf) to PPO and PG tests. ( #7915 )
2020-04-07 12:40:34 -07:00
Sven Mika
c2cb5c2214
[RLlib] MARWIL torch. ( #7836 )
...
* WIP.
* WIP.
* LINT.
* Fix MARWIL so it can run with eager-mode.
* LINT.
2020-04-06 16:38:50 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Sven Mika
82c2d9faba
[RLlib] Fix broken RLlib tests in master. ( #7894 )
2020-04-05 09:34:23 -07:00
Eric Liang
630b3b1752
[rllib] set daemon status for PolicyServerInput thread ( #7862 )
2020-04-04 16:08:51 -07:00
Sven Mika
1d4823c0ec
[RLlib] Add testing framework_iterator. ( #7852 )
...
* Add testing framework_iterator.
* LINT.
* WIP.
* Fix and LINT.
* LINT fix.
2020-04-03 12:24:25 -07:00
Sven Mika
bb6c675231
[RLlib] Bug fix: Copy is_exploring
placeholder for multi-GPU tower generation. ( #7846 )
2020-04-03 10:44:58 -07:00
Sven Mika
5537fe13b0
[RLlib] Exploration API: ParamNoise Integration into DQN; working example/test cases. ( #7814 )
2020-04-03 10:44:25 -07:00
Sven Mika
7b08db9f8c
[RLlib] Remove all instances of tf.contrib.layers. ... from RLlib code (deprecated). ( #7851 )
2020-04-01 18:03:14 -07:00
Sven Mika
e153e3179f
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. ( #7798 )
...
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-01 00:43:21 -07:00
Sven Mika
66df8b8c35
[RLlib] Working/learning example: PPO + torch + LSTM. ( #7797 )
2020-03-31 22:00:28 -07:00
Sven Mika
e356e97eb2
[RLlib] Assert correct policy class being used in Worker. ( #7769 )
2020-03-30 14:03:29 -07:00
Eric Liang
d6255c3395
Fix build breakage due to soft torch import ( #7790 )
2020-03-28 19:08:31 -07:00
Sven Mika
e4bd5db4d8
[RLlib] Minimal ParamNoise PR. ( #7772 )
2020-03-28 16:16:30 -07:00
Eric Liang
5cebee68d6
[rllib] Add scaling guide to documentation, improve bandit docs ( #7780 )
...
* update
* reword
* update
* ms
* multi node sgd
* reorder
* improve bandit docs
* contrib
* update
* ref
* improve refs
* fix build
* add pillow dep
* add pil
* update pil
* pillow
* remove false
2020-03-27 22:05:43 -07:00
Carl Balmer
0cfb6488a7
changed get_agent_class to from get_trainable_cls ( #7758 )
2020-03-27 12:17:16 -07:00
Sven Mika
93b5c38b7d
[RLlib] Noisy layers in DQN throw different errors (issue #7635 ). ( #7750 )
...
* Rollback.
* Fix issue 7635.
* Fix issue 7635.
* LINT and bug fix.
2020-03-26 22:08:34 -07:00
Sven Mika
369a3417c4
[RLlib] Add tf-graph by default when doing Policy.export_model()
. ( #7759 )
...
* Rollback.
* WIP.
* WIP.
* Fix.
* LINT.
2020-03-26 22:07:10 -07:00
Saurabh Gupta
6ddf84b019
Contextual Bandit algorithms (WIP) ( #7642 )
2020-03-26 13:41:16 -07:00
Sven Mika
bcf963a53b
[RLlib] Bug default policy overrides torch policy. ( #7756 )
...
* Rollback.
* Bug fix!
2020-03-26 10:03:20 -07:00
Eric Liang
9a590ac6a5
[rllib] Fix custom model metrics in multi-device case ( #7640 )
...
* fix example
* add example test
* lin
2020-03-23 12:40:22 -07:00
Sven Mika
1138f2ebed
[RLlib] Issue 7046 cannot restore keras model from h5 file. ( #7482 )
2020-03-23 12:19:30 -07:00
Robert Nishihara
ee8c9ff732
Remove six and cloudpickle from setup.py. ( #7694 )
2020-03-23 11:42:05 -07:00
Eric Liang
288933ec6b
[rllib] Fix shared metrics context in parallel iterators ( #7666 )
...
* debug
* build
* update
* wip
* wpi
* update
* recurisve sync
* comment
* stream
* fix
* Update .travis.yml
2020-03-22 14:15:01 -07:00
Sven Mika
2fb219a658
[Ray RLlib] Fix tree import ( #7662 )
...
* Rollback.
* Fix import tree error by adding meaningful error and replacing by tf.nest wherever possible.
* LINT.
* LINT.
* Fix.
* Fix log-likelihood test case failing on travis.
2020-03-22 13:51:24 -07:00
Eric Liang
7ebc6783e4
[rllib] Add back get_policy_output method for SAC model ( #7604 )
2020-03-20 12:44:04 -07:00
Eric Liang
9392cdbf74
[rllib] Add high-performance external application connector ( #7641 )
2020-03-20 12:43:57 -07:00
mehrdadn
a0700e2f86
Change /tmp to platform-specific temporary directory ( #7529 )
2020-03-16 18:10:14 -07:00
Eric Liang
797e6cfc2a
[rllib][tune] fix some nans ( #7611 )
2020-03-16 11:19:58 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
52cf77f5a9
[rllib] SAC no_done_at_end should default to False ( #7594 )
...
* update
* update doc
* stochastic
* cleanu
2020-03-14 11:16:54 -07:00
Eric Liang
c3a8ba399f
[rllib] Enable distributed exec api for A2C, A3C, PG by default ( #7580 )
2020-03-13 18:48:41 -07:00
Sven Mika
552cfb37ea
[RLlib] Fix bugs and speed up SegmentTree
2020-03-13 01:03:07 -07:00
Sven Mika
f165766813
[RLlib] Bug: If trainer config horizon
is provided, should try to increase env steps to that value. ( #7531 )
2020-03-12 11:03:37 -07:00
Sven Mika
80d314ae5e
[RLlib] Add all agents to rllib rollout
tests. ( #7534 )
2020-03-12 11:02:51 -07:00
Eric Liang
f5d12a958b
[rllib] Port Ape-X to distributed execution API ( #7497 )
2020-03-12 00:54:08 -07:00
Sven Mika
20ef4a8603
[RLlib] Cleanup/unify all test cases. ( #7533 )
2020-03-11 20:39:47 -07:00
Sven Mika
dded5b6d22
[RLlib] ES env_config
is not a EnvContext object (e.g. does not contain worker_index
). ( #7560 )
2020-03-11 20:33:20 -07:00
Sven Mika
bc120730e5
[RLlib] PPO(torch) on CartPole not tuned well enough for consistent learning ( #7556 )
2020-03-11 20:31:27 -07:00
Eric Liang
be48e1964b
[rllib] Fix per-worker exploration in Ape-X; make more kwargs required for future safety ( #7504 )
...
* fix sched
* lintc
* lint
* fix
* add unit test
* fix
* format
* fix test
* fix test
2020-03-10 11:14:14 -07:00
Sven Mika
f08687f550
[RLlib] rllib train
crashes when using torch PPO/PG/A2C. ( #7508 )
...
* Fix.
* Rollback.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
* TEST.
2020-03-08 13:03:18 -07:00
Eric Liang
a644060daa
[rllib] First pass at pipeline implementation of DQN ( #7433 )
...
* wip iters
* add test
* speed up
* update docs
* document it
* support serial sampling
* add test
* spacing
* annotate it
* update
* rename to pipeline
* comment
* iter2 wip
* update
* update
* context test
* update
* fix
* fix
* a3c pipeline
* doc
* update
* move timer
* comment
* add piepline test
* fix
* clean up
* document
* iter s
* wip dqn
* wip
* wip
* metrics
* metrics rename
* metrics ctx
* wip
* constants
* add todo
* suppport .union
* wip
* support union
* remove prints
* add todo
* remove auto timer
* fix up
* fix pipeline test
* typing
* fix breakage
* remove bad assert
* wip
* fix multiagent example
* fixapply
* update a3c
* remove a2c pl
* 0 workers
* wip
* wip
* share metrics
* wip
* wip
* doc
* fix weight sync and global var updates
* mode
* fix
* fix
* doc
* fix
2020-03-07 14:47:58 -08:00
Sven Mika
876a1ba5bd
[RLlib] Issue 7421: can't convert cuda tensor to numpy in torch ppo. ( #7445 )
2020-03-06 12:45:30 -08:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. ( #7320 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* update.
* WIP.
* Gumbel Softmax Dist.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP
* WIP.
* WIP.
* Hypertune.
* Hypertune.
* Hypertune.
* Lock-in.
* Cleanup.
* LINT.
* Fix.
* Update rllib/policy/eager_tf_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Fix items from review comments.
* Add dm_tree to RLlib dependencies.
* Add dm_tree to RLlib dependencies.
* Fix DQN test cases ((Torch)Categorical).
* Fix wrong pip install.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
1989eed3bf
[RLlib] Issue 7136: rollout not working for ES and ARS. ( #7444 )
...
* Fix.
* Fix issue #7136 .
* ARS fix.
2020-03-04 23:57:44 -08:00