ray/rllib/tuned_examples/maddpg/two-step-game-maddpg.yaml

two-step-game-maddpg:
    env: ray.rllib.examples.env.two_step_game.TwoStepGame
    run: MADDPG
    stop:
        episode_reward_mean: 7.2
        timesteps_total: 20000
    config:
        # MADDPG only supports tf for now.
        framework: tf

        env_config:
            env_config:
              actions_are_logits: true

        num_steps_sampled_before_learning_starts: 200

        multiagent:
            policies:
              p0:
                - null
                - null
                - null
                - {
                    agent_id: 0
                }
              p1:
                - null
                - null
                - null
                - {
                    agent_id: 1
                }
            # YAML-capable policy_mapping_fn definition via providing a callable class here.
            policy_mapping_fn:
                type: ray.rllib.examples.multi_agent_and_self_play.policy_mapping_fn.PolicyMappingFn
[RLlib] MADDPG: Move into main `algorithms` folder and add proper unit and learning tests. (#24579) 2022-05-24 12:53:53 +02:00			`two-step-game-maddpg:`
[RLlib] MADDPG: Move into agents folder (from contrib) and use `training_iteration` method. (#24502) 2022-05-06 12:35:21 +02:00			`env: ray.rllib.examples.env.two_step_game.TwoStepGame`
			`run: MADDPG`
			`stop:`
[RLlib] MADDPG: Move into main `algorithms` folder and add proper unit and learning tests. (#24579) 2022-05-24 12:53:53 +02:00			`episode_reward_mean: 7.2`
[RLlib] MADDPG: Move into agents folder (from contrib) and use `training_iteration` method. (#24502) 2022-05-06 12:35:21 +02:00			`timesteps_total: 20000`
			`config:`
			`# MADDPG only supports tf for now.`
			`framework: tf`

			`env_config:`
			`env_config:`
			`actions_are_logits: true`

[RLlib] Move learning_starts logic from buffers into `training_step()`. (#26032) 2022-08-11 13:07:30 +02:00			`num_steps_sampled_before_learning_starts: 200`
[RLlib] MADDPG: Move into agents folder (from contrib) and use `training_iteration` method. (#24502) 2022-05-06 12:35:21 +02:00
			`multiagent:`
			`policies:`
			`p0:`
			`- null`
			`- null`
			`- null`
			`- {`
			`agent_id: 0`
			`}`
			`p1:`
			`- null`
			`- null`
			`- null`
			`- {`
			`agent_id: 1`
			`}`
			`# YAML-capable policy_mapping_fn definition via providing a callable class here.`
			`policy_mapping_fn:`
			`type: ray.rllib.examples.multi_agent_and_self_play.policy_mapping_fn.PolicyMappingFn`