Commit graph

84 commits

Author SHA1 Message Date
Richard Liaw
a2d2275ee1
Revert "[RLlib + Tune] Add placement group support to RLlib. (#14289)" (#14360)
This reverts commit 6cd0cd3bd9.
2021-02-25 14:27:35 -08:00
Sven Mika
6cd0cd3bd9
[RLlib + Tune] Add placement group support to RLlib. (#14289) 2021-02-25 16:01:31 +01:00
Sven Mika
8000258333
[RLlib] R2D2 Implementation. (#13933) 2021-02-25 12:18:11 +01:00
QuantumMecha
0c93bb77cb
[RLlib] Update Documentation for Curiosity's support of continuous actions (#13784)
Only (Multi)Discrete action spaces are supported so far according to https://github.com/ray-project/ray/blob/master/rllib/utils/exploration/curiosity.py
2021-02-02 13:10:09 +01:00
Sven Mika
9dd9f72111
[RLlib] Add more detailed Documentation on Model building API (#13261) 2021-01-09 12:38:29 +01:00
Michael Luo
67229bf350
[RLlib] SlateQ Documentation (#13266) 2021-01-09 11:21:51 +01:00
Sven Mika
391cdfae8c
[RLlib] Trajectory view API docs. (#12718) 2020-12-30 17:32:21 -08:00
Michael Luo
6e6c680f14
MBMPO Cartpole (#11832)
* MBMPO Cartpole Done

* Added doc
2020-11-12 10:30:41 -08:00
Eric Liang
9b8218aabd
[docs] Move all /latest links to /master (#11897)
* use master link

* remae

* revert non-ray

* more

* mre
2020-11-10 10:53:28 -08:00
Yutai Zhou
6999db93cb
Un-indent multiagent section (#11310)
* Un-indent multiagent section
MARL section used to be nested inside bandits, which we probably don't want. Maybe give it its own section instead?
2020-10-29 16:12:48 +01:00
huyz-git
64e3c9741a
Update rllib-algorithms.rst (#11642) 2020-10-28 15:07:10 -07:00
Sven Mika
f91c455527
[RLlib] Curiosity documentation. (#11066) 2020-09-29 09:39:22 +02:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00
Eric Liang
f7d5aa46a3
[hotfix] Fix table formatting (#10687) 2020-09-09 16:08:54 -07:00
Justin Terry
8a1caf6279
rename centralized critic to shared critic (#10610) 2020-09-09 15:49:32 -07:00
Sven Mika
4b278c36fc
[RLlib] Behavioral Cloning (from MARWIL). (#10619) 2020-09-09 17:33:21 +02:00
Michael Luo
8e613652af
[RLLib] MBMPO Fixes (#10296) 2020-09-09 09:34:34 +02:00
Simon Mo
5a38a76c83
[Doc] Use sphinx_book_theme (#10379) 2020-09-08 16:25:23 -07:00
Justin Terry
352718610d
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-03 22:37:46 -07:00
Michael Luo
4e9888ce2f
[RLlib] Dreamer (#10172) 2020-08-26 13:24:05 +02:00
Matthew Strawbridge
7a5af7e744
Fix links to ddpg tuned examples (#9713) 2020-08-25 11:30:13 -07:00
Sven Mika
d14b501692
[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115) 2020-08-20 17:05:57 +02:00
Sven Mika
fe0bdb23ff
[RLlib] Attention Net/Transformers docs improvement. 2020-08-17 13:07:17 -07:00
Justin Terry
0d67602051
Update rllib-algorithms.rst (#9640) 2020-07-24 19:35:28 -07:00
Sven Mika
78dfed2683
[RLlib] Issue 8384: QMIX doesn't learn anything. (#9527) 2020-07-17 12:14:34 +02:00
Michael Luo
851d02463b
[Doc] RLlib Algorithms Documentation: MAML + PyTorch MAML (#9189) 2020-07-03 11:05:15 -07:00
Sven Mika
a90cd0fcbb
[RLlib] Unity3d soccer benchmarks (#8834) 2020-06-11 14:29:57 +02:00
Chapman Siu
04cffb7e65
[docs] rllib-models.rst - QMIX +parametric (#8868)
Updating docs to show that QMIX supports parametric action space, as per SMAC environments. 

This is reflected in the code here: https://github.com/ray-project/ray/blob/master/rllib/agents/qmix/qmix_policy.py#L179 and consistent with QMIX being an extension of DQN
2020-06-09 21:56:16 -07:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch in favor of framework=... (#8520) 2020-05-27 16:19:13 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers (#8345) 2020-05-21 10:16:18 -07:00
Sven Mika
b95e28faea
[RLlib] APEX_DDPG (PyTorch) test case and docs. (#8288)
APEX_DDPG (PyTorch) test case and docs.
2020-05-04 09:36:27 +02:00
Sven Mika
166bb5d690
[RLlib] IMPALA PyTorch (#8287)
This PR adds an IMPALA PyTorch implementation.

- adds compilation tests for LSTM and w/o LSTM.
- adds learning test for CartPole.
2020-05-03 13:44:25 +02:00
Sven Mika
499ad5fbe4
[RLlib] PyTorch version of APPO. (#8120)
- Translate all vtrace functionality to torch and added torch to the framework_iterator-loop in all existing vtrace test cases.
- Add learning test cases for APPO torch (both w/ and w/o v-trace).
- Add quick compilation tests for APPO (tf and torch, v-trace and no v-trace).
2020-04-23 09:11:12 +02:00
Sven Mika
d15609ba2a
[RLlib] PyTorch version of ARS (Augmented Random Search). (#8106)
This PR implements a PyTorch version of RLlib's ARS algorithm using RLlib's functional algo builder API. It also adds a regression test for ARS (torch) on CartPole.
2020-04-21 09:47:52 +02:00
Sven Mika
3812bfedda
[RLlib] PyTorch version of ES (Evolution Strategies). (#8104)
PyTorch version of Evolution Strategies (ES) Algo.
2020-04-20 21:47:28 +02:00
Sven Mika
d0fab84e4d
[RLlib] DDPG PyTorch version. (#7953)
The DDPG/TD3 algorithms currently do not have a PyTorch implementation. This PR adds PyTorch support for DDPG/TD3 to RLlib.
This PR:
- Depends on the re-factor PR for DDPG (Functional Algorithm API).
- Adds learning regression tests for the PyTorch version of DDPG and a DDPG (torch)
- Updates the documentation to reflect that DDPG and TD3 now support PyTorch.

* Learning Pendulum-v0 on torch version (same config as tf). Wall time a little slower (~20% than tf).
* Fix GPU target model problem.
2020-04-16 10:20:01 +02:00
Sven Mika
d2b5c171cb
[RLlib] Add pytorch sigils to toc and add links to algo overview table. (#7950)
* Add torch sigils to toc-tree for DQN/APEX.

* WIP.
2020-04-09 10:40:18 -07:00
Sven Mika
22ccc43670
[RLlib] DQN torch version. (#7597)
* Fix.

* Rollback.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* Fix.

* Fix.

* Fix.

* Fix.

* WIP.

* WIP.

* Fix.

* Test case fixes.

* Test case fixes and LINT.

* Test case fixes and LINT.

* Rollback.

* WIP.

* WIP.

* Test case fixes.

* Fix.

* Fix.

* Fix.

* Add regression test for DQN w/ param noise.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Fixes and LINT.

* Comment

* Regression test case.

* WIP.

* WIP.

* LINT.

* LINT.

* WIP.

* Fix.

* Fix.

* Fix.

* LINT.

* Fix (SAC does currently not support eager).

* Fix.

* WIP.

* LINT.

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/evaluation/sampler.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/utils/exploration/exploration.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* WIP.

* WIP.

* Fix.

* LINT.

* LINT.

* Fix and LINT.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix.

* LINT.

* Fix.

* Fix and LINT.

* Update rllib/utils/exploration/exploration.py

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Update rllib/policy/dynamic_tf_policy.py

Co-Authored-By: Eric Liang <ekhliang@gmail.com>

* Fixes.

* WIP.

* LINT.

* Fixes and LINT.

* LINT and fixes.

* LINT.

* Move action_dist back into torch extra_action_out_fn and LINT.

* Working SimpleQ learning cartpole on both torch AND tf.

* Working Rainbow learning cartpole on tf.

* Working Rainbow learning cartpole on tf.

* WIP.

* LINT.

* LINT.

* Update docs and add torch to APEX test.

* LINT.

* Fix.

* LINT.

* Fix.

* Fix.

* Fix and docstrings.

* Fix broken RLlib tests in master.

* Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier).

* Fix error_outputs option in BAZEL for RLlib regression tests.

* Fix.

* Tune param-noise tests.

* LINT.

* Fix.

* Fix.

* test

* test

* test

* Fix.

* Fix.

* WIP.

* WIP.

* WIP.

* WIP.

* LINT.

* WIP.

Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-04-06 11:56:16 -07:00
Eric Liang
5cebee68d6
[rllib] Add scaling guide to documentation, improve bandit docs (#7780)
* update

* reword

* update

* ms

* multi node sgd

* reorder

* improve bandit docs

* contrib

* update

* ref

* improve refs

* fix build

* add pillow dep

* add pil

* update pil

* pillow

* remove false
2020-03-27 22:05:43 -07:00
Saurabh Gupta
6ddf84b019
Contextual Bandit algorithms (WIP) (#7642) 2020-03-26 13:41:16 -07:00
hubcity
3d0a8662b3
#7246 - Fixing broken links (#7247)
* #7246 - Fixing broken links

* Apply suggestions from code review

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-25 21:46:13 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503)
* bulk rename

* deprecation warn

* update doc

* update fig

* line length

* rename

* make pytest comptaible

* fix test

* fi sys

* rename

* wip

* fix more

* lint

* update svg

* comments

* lint

* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation (#7088) 2020-02-10 15:28:27 -08:00
Sven Mika
0e3960893a
[RLlib] Add rainbow config hint to algo-documentation. (#7052) 2020-02-05 12:01:43 -08:00
Eric Liang
6bb30c9f1b fix links (#6883) 2020-01-22 01:06:07 -08:00
Eric Liang
14016535a5
[rllib] Add TF and Torch icons to show which are available for each algo (#6869) 2020-01-20 15:22:21 -08:00
Sven Mika
7659cae3ba [RLlib] Add PG torch regression test (#6828)
* Add PG torch regression test to tuned_examples/regression_tests dir.

* Rename cartpole-pg.yaml into cartpole-pg-tf.yaml

* cartpole-pg-tf.yaml: Change cartpole-pg name of tuned_example to cartpole-pg-tf.
2020-01-18 15:57:12 -08:00
Justin Terry
97bf79917c [RLlib] Update MADDPG example repo to maintained fork (#6831) 2020-01-18 13:08:27 -08:00
Michael Luo
e5dded917c SAC site changes (#6759) 2020-01-09 18:13:42 -08:00
Michael Luo
1cb335487e SAC for Mujoco Environments (#6642) 2019-12-31 00:16:54 -08:00