Sven Mika
a90cd0fcbb
[RLlib] Unity3d soccer benchmarks ( #8834 )
2020-06-11 14:29:57 +02:00
Chapman Siu
04cffb7e65
[docs] rllib-models.rst
- QMIX +parametric ( #8868 )
...
Updating docs to show that QMIX supports parametric action space, as per SMAC environments.
This is reflected in the code here: https://github.com/ray-project/ray/blob/master/rllib/agents/qmix/qmix_policy.py#L179 and consistent with QMIX being an extension of DQN
2020-06-09 21:56:16 -07:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch
in favor of framework=...
( #8520 )
2020-05-27 16:19:13 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers ( #8345 )
2020-05-21 10:16:18 -07:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. ( #8288 )
...
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Sven Mika
166bb5d690
[RLlib] IMPALA PyTorch ( #8287 )
...
This PR adds an IMPALA PyTorch implementation.
- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Sven Mika
499ad5fbe4
[RLlib] PyTorch version of APPO. ( #8120 )
...
- Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases.
- Add learning test cases for APPO torch (both w/ and w/o v-trace).
- Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).
2020-04-23 09:11:12 +02:00
Sven Mika
d15609ba2a
[RLlib] PyTorch version of ARS (Augmented Random Search). ( #8106 )
...
This PR implements a PyTorch version of RLlib's ARS algorithm using RLlib's functional algo builder API. It also adds a regression test for ARS (torch) on CartPole.
2020-04-21 09:47:52 +02:00
Sven Mika
3812bfedda
[RLlib] PyTorch version of ES (Evolution Strategies). ( #8104 )
...
PyTorch version of Evolution Strategies (ES) Algo.
2020-04-20 21:47:28 +02:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. ( #7953 )
...
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.
* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Sven Mika
d2b5c171cb
[RLlib] Add pytorch sigils to toc and add links to algo overview table. ( #7950 )
...
* Add torch sigils to toc-tree for DQN/APEX.
* WIP.
2020-04-09 10:40:18 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. ( #7597 )
...
* Fix.
* Rollback.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix.
* Fix.
* Fix.
* WIP.
* WIP.
* Fix.
* Test case fixes.
* Test case fixes and LINT.
* Test case fixes and LINT.
* Rollback.
* WIP.
* WIP.
* Test case fixes.
* Fix.
* Fix.
* Fix.
* Add regression test for DQN w/ param noise.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Fixes and LINT.
* Comment
* Regression test case.
* WIP.
* WIP.
* LINT.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* WIP.
* LINT.
* Fixes and LINT.
* LINT and fixes.
* LINT.
* Move action_dist back into torch extra_action_out_fn and LINT.
* Working SimpleQ learning cartpole on both torch AND tf.
* Working Rainbow learning cartpole on tf.
* Working Rainbow learning cartpole on tf.
* WIP.
* LINT.
* LINT.
* Update docs and add torch to APEX test.
* LINT.
* Fix.
* LINT.
* Fix.
* Fix.
* Fix and docstrings.
* Fix broken RLlib tests in master.
* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).
* Fix error_outputs option in BAZEL for RLlib regression tests.
* Fix.
* Tune param-noise tests.
* LINT.
* Fix.
* Fix.
* test
* test
* test
* Fix.
* Fix.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Eric Liang
5cebee68d6
[rllib] Add scaling guide to documentation, improve bandit docs ( #7780 )
...
* update
* reword
* update
* ms
* multi node sgd
* reorder
* improve bandit docs
* contrib
* update
* ref
* improve refs
* fix build
* add pillow dep
* add pil
* update pil
* pillow
* remove false
2020-03-27 22:05:43 -07:00
Saurabh Gupta
6ddf84b019
Contextual Bandit algorithms (WIP) ( #7642 )
2020-03-26 13:41:16 -07:00
hubcity
3d0a8662b3
#7246 - Fixing broken links ( #7247 )
...
* #7246 - Fixing broken links
* Apply suggestions from code review
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-25 21:46:13 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation ( #7088 )
2020-02-10 15:28:27 -08:00
Sven Mika
0e3960893a
[RLlib] Add rainbow config hint to algo-documentation. ( #7052 )
2020-02-05 12:01:43 -08:00
Eric Liang
6bb30c9f1b
fix links ( #6883 )
2020-01-22 01:06:07 -08:00
Eric Liang
14016535a5
[rllib] Add TF and Torch icons to show which are available for each algo ( #6869 )
2020-01-20 15:22:21 -08:00
Sven Mika
7659cae3ba
[RLlib] Add PG torch regression test ( #6828 )
...
* Add PG torch regression test to tuned_examples/regression_tests dir.
* Rename cartpole-pg.yaml into cartpole-pg-tf.yaml
* cartpole-pg-tf.yaml: Change cartpole-pg name of tuned_example to cartpole-pg-tf.
2020-01-18 15:57:12 -08:00
Justin Terry
97bf79917c
[RLlib] Update MADDPG example repo to maintained fork ( #6831 )
2020-01-18 13:08:27 -08:00
Michael Luo
e5dded917c
SAC site changes ( #6759 )
2020-01-09 18:13:42 -08:00
Michael Luo
1cb335487e
SAC for Mujoco Environments ( #6642 )
2019-12-31 00:16:54 -08:00
Victor Le
4e24c805ee
AlphaZero and Ranked reward implementation ( #6385 )
2019-12-07 12:08:40 -08:00
Eric Liang
243b1b7281
[rllib] Add microbatch optimizer with A2C example ( #6161 )
2019-11-14 12:14:00 -08:00
Olli Huotari
0916603e61
Fixed few broken links in docs ( #5477 )
...
* hyperband link changed
* tuned_examples link fix
* doc lstm link fix
* kubernetes example link fix
2019-08-19 14:22:25 -07:00
Eric Liang
a1d2e17623
[rllib] Autoregressive action distributions ( #5304 )
2019-08-10 14:05:12 -07:00
Eric Liang
592f313210
[rllib] Centralized critic / PPO example on TwoStepGame ( #5392 )
2019-08-08 14:03:28 -07:00
Eric Liang
7d747da420
[rllib] [docs] Add some architecture diagrams ( #5390 )
2019-08-06 20:14:57 -07:00
Wonseok Jeon
281829e712
MADDPG implementation in RLlib ( #5348 )
2019-08-06 16:22:06 -07:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir ( #5324 )
2019-08-05 23:25:49 -07:00
Eric Liang
955154a19d
Reduce Ray / RLlib startup messages ( #5368 )
2019-08-05 13:23:54 -07:00
Kristian Hartikainen
13fb9fe3db
[rllib] Feature/soft actor critic v2 ( #5328 )
...
* Add base for Soft Actor-Critic
* Pick changes from old SAC branch
* Update sac.py
* First implementation of sac model
* Remove unnecessary SAC imports
* Prune unnecessary noise and exploration code
* Implement SAC model and use that in SAC policy
* runs but doesn't learn
* clear state
* fix batch size
* Add missing alpha grads and vars
* -200 by 2k timesteps
* doc
* lazy squash
* one file
* ignore tfp
* revert done
2019-08-01 23:37:36 -07:00
Eric Liang
a62c5f40f6
[rllib] Document ModelV2 and clean up the models/ directory ( #5277 )
2019-07-27 02:08:16 -07:00
Eric Liang
2dd0beb5bd
[rllib] Allow access to batches prior to postprocessing ( #4871 )
2019-05-29 18:17:14 -07:00
Eric Liang
02583a8598
[rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ ( #4819 )
...
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang
7d5ef6d99c
[rllib] Support continuous action distributions in IMPALA/APPO ( #4771 )
2019-05-16 22:05:07 -07:00
Sam Toyer
663e92ab3f
[rllib] TD3/DDPG improvements and MuJoCo benchmarks ( #4694 )
...
* [rllib] Separate optimisers for DDPG actor & crit.
* [rllib] Better names for DDPG variables & options
Config changes:
- noise_scale -> exploration_ou_noise_scale
- exploration_theta -> exploration_ou_theta
- exploration_sigma -> exploration_ou_sigma
- act_noise -> exploration_gaussian_sigma
- noise_clip -> target_noise_clip
* [rllib] Make DDPG less class-y
Used functions to replace three classes with only an __init__ method & a
handful of unrelated attributes.
* [rllib] Refactor DDPG noise
* [rllib] Unify DDPG exploration annealing
Added option "exploration_should_anneal" to enable linear annealing of
exploration noise. By default this is off, for consistency with DDPG &
TD3 papers. Also renamed "exploration_final_eps" to
"exploration_final_scale" (that name seems to have been carried over
from DQN, and doesn't really make sense here). Finally, tried to rename
"eps" to "noise_scale" wherever possible.
2019-04-26 17:49:53 -07:00
Megan Kawakami
346885068c
[rllib] add torch pg ( #3857 )
...
* add torch pg
* add torch imports
* added torch pg
* working torch pg implementation
* add pg pytorch
* Update a3c.py
* Update a3c.py
* Update torch_policy_graph.py
* Update torch_policy_graph.py
2019-02-16 19:54:14 -08:00
Michael Luo
1a015e420b
Optimal PPO Configs (10k reward in 1 hr) + PPO grad clipping implemented ( #3934 )
2019-02-02 22:10:58 -08:00
Michael Luo
16f7ca45e4
Appo ( #3779 )
...
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder
* Deleted unneccesary vtrace.py file
* Update pong-impala.yaml
* Cleaned PPO Code
* Update pong-impala.yaml
* Update pong-impala.yaml
* wip
* new ifle
* refactor
* add vtrace off option
* revert
* support any space
* docs
* fix comment
* remove kl
* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Jones Wong
319c1340cb
[rllib] Develop MARWIL ( #3635 )
...
* add marvil policy graph
* fix typo
* add offline optimizer and enable running marwil
* fix loss function
* add maintaining the moving average of advantage norm
* use sync replay optimizer for unifying
* remove offline optimizer and use sync replay optimizer
* format by yapf
* add imitation learning objective
* fix according to eric's review
* format by yapf
* revise
* add test data
* marwil
2019-01-16 19:00:43 -08:00
Eric Liang
47d36d7bd6
[rllib] Refactor pytorch custom model support ( #3634 )
2019-01-03 13:48:33 +08:00
Eric Liang
303883a3b6
[rllib] [rfc] add contrib module and guideline for merging ( #3565 )
...
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) ( #3548 )
2018-12-18 10:40:01 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms ( #3384 )
2018-11-27 23:35:19 -08:00
Eric Liang
8b76bab25c
[rllib] docs for td3 ( #3381 )
...
* td3 doc
* Update rllib-env.rst
2018-11-22 13:36:47 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers ( #3327 )
2018-11-19 20:55:27 -08:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs ( #3064 )
2018-10-16 15:55:11 -07:00