Rohan138
b9c9cc5946
[RLlib] Updated PettingZoo+RLlib tutorial; Removed pettingzoo example script ( #19069 )
...
* Updated PettingZoo+RLlib tutorial
Updated the tutorial and added link to the blog post by the PettingZoo team.
* Ran linting
* Converted link to tinyurl for linting
* fixed line lengths
* Decrease num_workers to 1
* Added comments
* Decreased num_workers
* Decreased timesteps
* Increased num_workers
* Update links and remove pettingzoo_env.py
* remove pettingzoo.py script from tests
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-10-29 10:57:10 +02:00
Renos Zabounidis
41dd037ae9
[RLlib; Docs] Correcting documentation with respect to postprocess_trajectory ( #19672 )
...
postprocess_trajectory is referred to incorrectly in the rllib-environments documentation. When defining a custom policy, a user never directly modifies Policy.postprocess_trajectory, they define postprocess_fn, which is in turn called by postprocess_trajectory.
2021-10-25 09:37:58 +02:00
Sven Mika
c95dea51e9
[RLlib] External env enhancements + more examples. ( #16583 )
2021-06-23 09:09:01 +02:00
Sven Mika
391cdfae8c
[RLlib] Trajectory view API docs. ( #12718 )
2020-12-30 17:32:21 -08:00
Benjamin Black
1999266bba
Updated pettingzoo env to acomidate api changes and fixes ( #11873 )
...
* Updated pettingzoo env to acomidate api changes and fixes
* fixed test failure
* fixed linting issue
* fixed test failure
2020-11-09 16:09:49 -08:00
Carlos Aguayo
86b1814e62
Update rllib-env.rst ( #10891 )
...
Tiny typo
2020-09-19 00:29:56 -07:00
Benjamin Black
f2408b719c
Fixed PettingZooEnv ( #10847 )
2020-09-17 11:28:42 -07:00
Stefan Schneider
6db55ca8db
[docs][rllib] Recommended workflow for training, saving, and testing ( #9319 )
2020-07-09 15:47:10 -07:00
Sven Mika
4da0e542d5
[RLlib] DDPG and SAC eager support (preparation for tf2.x) ( #9204 )
2020-07-08 16:12:20 +02:00
Benjamin Black
1425cdf834
Pettingzoo environment support ( #9271 )
...
* added pettingzoo wrapper env and example
* added docs, examples for pettingzoo env support
* fixed pettingzoo env flake8, added test
* fixed pettingzoo env import
* fixed pettingzoo env import
* fixed pettingzoo import issue
* fixed pettingzoo test
* fixed linting problem
* fixed bad quotes
* future proofed pettingzoo dependency
* fixed ray init in pettingzoo env
* lint
* manual lint
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-07-06 21:32:26 -07:00
Sven Mika
d8a081a185
[RLlib] Unity3D integration (n Unity3D clients vs learning server). ( #8590 )
2020-05-30 22:48:34 +02:00
Eric Liang
f48da50e1c
[rllib] observation function api for multi-agent ( #8236 )
2020-05-04 22:13:49 -07:00
Eric Liang
5cebee68d6
[rllib] Add scaling guide to documentation, improve bandit docs ( #7780 )
...
* update
* reword
* update
* ms
* multi node sgd
* reorder
* improve bandit docs
* contrib
* update
* ref
* improve refs
* fix build
* add pillow dep
* add pil
* update pil
* pillow
* remove false
2020-03-27 22:05:43 -07:00
hubcity
3d0a8662b3
#7246 - Fixing broken links ( #7247 )
...
* #7246 - Fixing broken links
* Apply suggestions from code review
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-03-25 21:46:13 -07:00
Eric Liang
9392cdbf74
[rllib] Add high-performance external application connector ( #7641 )
2020-03-20 12:43:57 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Sven Mika
510c850651
[RLlib] SAC add discrete action support. ( #7320 )
...
* Exploration API (+EpsilonGreedy sub-class).
* Exploration API (+EpsilonGreedy sub-class).
* Cleanup/LINT.
* Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents).
* Add `error` option to deprecation_warning().
* WIP.
* Bug fix: Get exploration-info for tf framework.
Bug fix: Properly deprecate some DQN config keys.
* WIP.
* LINT.
* WIP.
* Split PerWorkerEpsilonGreedy out of EpsilonGreedy.
Docstrings.
* Fix bug in sampler.py in case Policy has self.exploration = None
* Update rllib/agents/dqn/dqn.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Update rllib/agents/trainer.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* Change requests.
* LINT
* In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set
* Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps).
* Update rllib/evaluation/worker_set.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Review fixes.
* Fix default value for DQN's exploration spec.
* LINT
* Fix recursion bug (wrong parent c'tor).
* Do not pass timestep to get_exploration_info.
* Update tf_policy.py
* Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs.
* Bug fix tf-action-dist
* DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG).
* Switch off exploration when getting action probs from off-policy-estimator's policy.
* LINT
* Fix test_checkpoint_restore.py.
* Deprecate all SAC exploration (unused) configs.
* Properly use `model.last_output()` everywhere. Instead of `model._last_output`.
* WIP.
* Take out set_epsilon from multi-agent-env test (not needed, decays anyway).
* WIP.
* Trigger re-test (flaky checkpoint-restore test).
* WIP.
* WIP.
* Add test case for deterministic action sampling in PPO.
* bug fix.
* Added deterministic test cases for different Agents.
* Fix problem with TupleActions in dynamic-tf-policy.
* Separate supported_spaces tests so they can be run separately for easier debugging.
* LINT.
* Fix autoregressive_action_dist.py test case.
* Re-test.
* Fix.
* Remove duplicate py_test rule from bazel.
* LINT.
* WIP.
* WIP.
* SAC fix.
* SAC fix.
* WIP.
* WIP.
* WIP.
* FIX 2 examples tests.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Renamed test file.
* WIP.
* Add unittest.main.
* Make action_dist_class mandatory.
* fix
* FIX.
* WIP.
* WIP.
* Fix.
* Fix.
* Fix explorations test case (contextlib cannot find its own nullcontext??).
* Force torch to be installed for QMIX.
* LINT.
* Fix determine_tests_to_run.py.
* Fix determine_tests_to_run.py.
* WIP
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function).
* Rename some stuff.
* Rename some stuff.
* WIP.
* update.
* WIP.
* Gumbel Softmax Dist.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP
* WIP.
* WIP.
* Hypertune.
* Hypertune.
* Hypertune.
* Lock-in.
* Cleanup.
* LINT.
* Fix.
* Update rllib/policy/eager_tf_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/agents/sac/sac_policy.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Update rllib/models/tf/tf_action_dist.py
Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com>
* Fix items from review comments.
* Add dm_tree to RLlib dependencies.
* Add dm_tree to RLlib dependencies.
* Fix DQN test cases ((Torch)Categorical).
* Fix wrong pip install.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com>
2020-03-06 10:37:12 -08:00
Eric Liang
249ca2cf9e
[rllib] add blog posts to examples list ( #5762 )
...
* add blog post
* remove
* link
2019-09-23 10:42:21 -07:00
gehring
b520f6141e
[rllib] Adds eager support with a generic TFEagerPolicy
class ( #5436 )
2019-08-23 14:21:11 +08:00
Eric Liang
a1d2e17623
[rllib] Autoregressive action distributions ( #5304 )
2019-08-10 14:05:12 -07:00
Eric Liang
592f313210
[rllib] Centralized critic / PPO example on TwoStepGame ( #5392 )
2019-08-08 14:03:28 -07:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir ( #5324 )
2019-08-05 23:25:49 -07:00
Kristian Hartikainen
13fb9fe3db
[rllib] Feature/soft actor critic v2 ( #5328 )
...
* Add base for Soft Actor-Critic
* Pick changes from old SAC branch
* Update sac.py
* First implementation of sac model
* Remove unnecessary SAC imports
* Prune unnecessary noise and exploration code
* Implement SAC model and use that in SAC policy
* runs but doesn't learn
* clear state
* fix batch size
* Add missing alpha grads and vars
* -200 by 2k timesteps
* doc
* lazy squash
* one file
* ignore tfp
* revert done
2019-08-01 23:37:36 -07:00
Eric Liang
20450a4e82
[rllib] Add rock paper scissors multi-agent example ( #5336 )
2019-08-01 13:03:59 -07:00
Eric Liang
a45c61e19b
[rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section ( #4821 )
...
* wip
* fix index
* fix bugs
* todo
* add imports
* note on get ph
* note on get ph
* rename to building custom algs
* add rnn state info
2019-05-27 14:17:32 -07:00
Eric Liang
02583a8598
[rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ ( #4819 )
...
This implements some of the renames proposed in #4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.
2019-05-20 16:46:05 -07:00
Eric Liang
3807fb505b
[rllib] TensorFlow 2 compatibility ( #4802 )
2019-05-16 22:12:07 -07:00
Eric Liang
7d5ef6d99c
[rllib] Support continuous action distributions in IMPALA/APPO ( #4771 )
2019-05-16 22:05:07 -07:00
Eric Liang
37208216ae
[rllib] Rename Agent to Trainer ( #4556 )
2019-04-07 00:36:18 -07:00
bjg2
77005d1814
[rllib] Make batch timeout for remote workers tunable ( #4435 )
2019-03-29 13:19:42 -07:00
Eric Liang
5b8eb475ce
[rllib] Allow None to be specified in multi-agent envs ( #4464 )
...
* wip
* check
* doc update
* Update hierarchical_training.py
2019-03-25 11:38:17 -07:00
Eric Liang
c7f74dbdc7
[rllib] Add async remote workers ( #4253 )
2019-03-08 15:39:48 -08:00
Eric Liang
d9da183c7d
[rllib] Custom supervised loss API ( #4083 )
2019-02-24 15:36:13 -08:00
Eric Liang
f8bef004da
[rllib] Improve error message for bad envs, add remote env docs ( #4044 )
...
* commit
* fix up rew
2019-02-18 01:28:19 -08:00
Eric Liang
fb73cedf70
[rllib] Add examples page, add hierarchical training example, delete SC2 examples ( #3815 )
...
* wip
* lint
* wip
* up
* wip
* update examples
* wip
* remove carla
* update
* improve envspec
* link to custom
* Update rllib-env.rst
* update
* fix
* fn
* lint
* ds
* ssd games
* desc
* fix up docs
* fix
2019-01-29 21:06:09 -08:00
Eric Liang
04ec47cbd4
[rllib] annotate public vs developer vs private APIs ( #3808 )
2019-01-23 21:27:26 -08:00
Michael Luo
16f7ca45e4
Appo ( #3779 )
...
* Deleted old fork, updated new ray and moved PPO-impala to APPO in ppo folder
* Deleted unneccesary vtrace.py file
* Update pong-impala.yaml
* Cleaned PPO Code
* Update pong-impala.yaml
* Update pong-impala.yaml
* wip
* new ifle
* refactor
* add vtrace off option
* revert
* support any space
* docs
* fix comment
* remove kl
* Update cartpole-appo-vtrace.yaml
2019-01-18 13:40:26 -08:00
Jones Wong
319c1340cb
[rllib] Develop MARWIL ( #3635 )
...
* add marvil policy graph
* fix typo
* add offline optimizer and enable running marwil
* fix loss function
* add maintaining the moving average of advantage norm
* use sync replay optimizer for unifying
* remove offline optimizer and use sync replay optimizer
* format by yapf
* add imitation learning objective
* fix according to eric's review
* format by yapf
* revise
* add test data
* marwil
2019-01-16 19:00:43 -08:00
Eric Liang
401e656b95
[rllib] Sync filters at end of iteration not start; hierarchical docs ( #3769 )
2019-01-15 16:25:25 -08:00
Eric Liang
ca864faece
[rllib] Documentation for I/O API and multi-agent support / cleanup ( #3650 )
2019-01-03 15:15:36 +08:00
Eric Liang
303883a3b6
[rllib] [rfc] add contrib module and guideline for merging ( #3565 )
...
This adds guidelines for merging code into `rllib/contrib` vs `rllib/agents`. Also, clean up the agent import code to make registration easier.
2018-12-20 10:44:34 -08:00
Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) ( #3548 )
2018-12-18 10:40:01 -08:00
Eric Liang
ce388a45cf
[rllib] Learner should not see clipped actions ( #3496 )
2018-12-09 21:57:11 -08:00
Eric Liang
d864f299d7
[rllib] fixes from dogfooding multi-agent ( #3456 )
...
auto wrap multi-agent dict and tuple spaces by keeping a policy -> preprocessor in the sampler
add some Q-learning debug stats
report min, max of custom metrics
better errors
2018-12-05 23:31:45 -08:00
Eric Liang
ce355d13d4
[rllib] Allow envs to be auto-registered; add on_train_result callback with curriculum example ( #3451 )
...
* train step and docs
* debug
* doc
* doc
* fix examples
* fix code
* integration test
* fix
* ...
* space
* instance
* Update .travis.yml
* fix test
2018-12-03 23:15:43 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms ( #3384 )
2018-11-27 23:35:19 -08:00
Eric Liang
8b76bab25c
[rllib] docs for td3 ( #3381 )
...
* td3 doc
* Update rllib-env.rst
2018-11-22 13:36:47 -08:00
Eric Liang
bd0dbde149
[rllib] Rename ServingEnv => ExternalEnv ( #3302 )
2018-11-12 16:31:27 -08:00
Eric Liang
9dd3eedbac
[rllib] rollout.py should reduce num workers ( #3263 )
...
## What do these changes do?
Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
## Related issue number
Closes #3260 .
2018-11-09 12:29:16 -08:00
Eric Liang
369cb833fe
[rllib] Implement custom metrics ( #3144 )
2018-11-03 18:48:32 -07:00