ray/rllib/examples/multi_agent_parameter_sharing.py

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

# Based on code from github.com/parametersharingmadrl/parametersharingmadrl

if __name__ == "__main__":
    # RDQN - Rainbow DQN
    # ADQN - Apex DQN

    register_env("waterworld", lambda _: PettingZooEnv(waterworld_v3.env()))

    tune.run(
        "APEX_DDPG",
        stop={"episodes_total": 60000},
        checkpoint_freq=10,
        config={
            # Enviroment specific.
            "env": "waterworld",
            # General
            "num_gpus": 1,
            "num_workers": 2,
            "num_envs_per_worker": 8,
            "replay_buffer_config": {
                "learning_starts": 1000,
                "capacity": int(1e5),
                "prioritized_replay_alpha": 0.5,
            },
            "compress_observations": True,
            "rollout_fragment_length": 20,
            "train_batch_size": 512,
            "gamma": 0.99,
            "n_step": 3,
            "lr": 0.0001,
            "target_network_update_freq": 50000,
            "min_sample_timesteps_per_iteration": 25000,
            # Method specific.
            "multiagent": {
                # We only have one policy (calling it "shared").
                # Class, obs/act-spaces, and config will be derived
                # automatically.
                "policies": {"shared_policy"},
                # Always use "shared" policy.
                "policy_mapping_fn": (
                    lambda agent_id, episode, **kwargs: "shared_policy"
                ),
            },
        },
    )
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`from ray import tune`
			`from ray.tune.registry import register_env`
[RLlib] Env directory cleanup and tests. (#13082) 2021-01-19 10:09:39 +01:00			`from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv`
[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339) 2022-01-04 18:30:26 +01:00			`from pettingzoo.sisl import waterworld_v3`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00
			`# Based on code from github.com/parametersharingmadrl/parametersharingmadrl`

			`if __name__ == "__main__":`
			`# RDQN - Rainbow DQN`
			`# ADQN - Apex DQN`

[RLlib] Revert "Revert "updated pettingzoo wrappers, env versions, urls"" (#21339) 2022-01-04 18:30:26 +01:00			`register_env("waterworld", lambda _: PettingZooEnv(waterworld_v3.env()))`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00
			`tune.run(`
			`"APEX_DDPG",`
			`stop={"episodes_total": 60000},`
			`checkpoint_freq=10,`
			`config={`
[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046) 2021-07-15 05:51:24 -04:00			`# Enviroment specific.`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"env": "waterworld",`
			`# General`
			`"num_gpus": 1,`
			`"num_workers": 2,`
			`"num_envs_per_worker": 8,`
[RLlib] Replay Buffer API and Ape-X. (#24506) 2022-05-17 13:43:49 +02:00			`"replay_buffer_config": {`
			`"learning_starts": 1000,`
			`"capacity": int(1e5),`
			`"prioritized_replay_alpha": 0.5,`
			`},`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"compress_observations": True,`
[RLlib] Deprecate old classes, methods, functions, config keys (in prep for RLlib 1.0). (#10544) 2020-09-06 10:58:00 +02:00			`"rollout_fragment_length": 20,`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"train_batch_size": 512,`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`"gamma": 0.99,`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"n_step": 3,`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`"lr": 0.0001,`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"target_network_update_freq": 50000,`
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076) 2022-06-10 17:09:18 +02:00			`"min_sample_timesteps_per_iteration": 25000,`
[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046) 2021-07-15 05:51:24 -04:00			`# Method specific.`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`"multiagent": {`
[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). (#17046) 2021-07-15 05:51:24 -04:00			`# We only have one policy (calling it "shared").`
			`# Class, obs/act-spaces, and config will be derived`
			`# automatically.`
			`"policies": {"shared_policy"},`
			`# Always use "shared" policy.`
[RLlib] Re-do: Trainer: Support add and delete Policies. (#16569) 2021-06-21 13:46:01 +02:00			`"policy_mapping_fn": (`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`lambda agent_id, episode, **kwargs: "shared_policy"`
			`),`
Multi-agent Algorithm Documentation Updates (#9722) 2020-09-04 01:37:46 -04:00			`},`
			`},`
			`)`