* Updated PettingZoo+RLlib tutorial
Updated the tutorial and added link to the blog post by the PettingZoo team.
* Ran linting
* Converted link to tinyurl for linting
* fixed line lengths
* Decrease num_workers to 1
* Added comments
* Decreased num_workers
* Decreased timesteps
* Increased num_workers
* Update links and remove pettingzoo_env.py
* remove pettingzoo.py script from tests
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* wip.
* Test: Make a change in tune to trigger tune tests, which are not run otherwise, but seem to fail nevertheless with this PR's changes.
* remove bare_metal_policy_with_custom_view_reqs from tests
* Fix DDPG, since it is based on GenericOffPolicyTrainer.
* Fix QMix, SAC, and MADDPA too.
* Undo QMix change.
* Fix DQN input batch type. Always use SampleBatch.
* apex ddpg should not use replay_buffer_config yet.
* Make eager tf policy to use SampleBatch.
* lint
* LINT.
* Re-enable RLlib broken tests to make sure things work ok now.
* fixes.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* Revert "[CI] Remove config that disables Bazel test result cache (#18701)"
This reverts commit 098ff36faa.
* Remove all RLlib tests from BUILD that currently fail.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* [RLlib] Unify the way we create and use LocalReplayBuffer for all the agents.
This change
1. Get rid of the try...except clause when we call execution_plan(),
and get rid of the Deprecation warning as a result.
2. Fix the execution_plan() call in Trainer._try_recover() too.
3. Most importantly, makes it much easier to create and use different types
of local replay buffers for all our agents.
E.g., allow us to easily create a reservoir sampling replay buffer for
APPO agent for Riot in the near future.
* Introduce explicit configuration for replay buffer types.
* Fix is_training key error.
* actually deprecate buffer_size field.
* [RLlib] Raise error for kl penalty ddpo
DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)
* Update rllib/agents/ppo/ddppo.py
Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>