ray/rllib/tests/test_multi_agent_pendulum.py

"""Integration test: (1) pendulum works, (2) single-agent multi-agent works."""
import unittest

import ray
from ray.tune import run_experiments
from ray.tune.registry import register_env
from ray.rllib.examples.env.multi_agent import MultiAgentPendulum
from ray.rllib.utils.test_utils import framework_iterator


class TestMultiAgentPendulum(unittest.TestCase):
    def setUp(self) -> None:
        ray.init()

    def tearDown(self) -> None:
        ray.shutdown()

    def test_multi_agent_pendulum(self):
        register_env("multi_agent_pendulum",
                     lambda _: MultiAgentPendulum({"num_agents": 1}))

        stop = {
            "timesteps_total": 500000,
            "episode_reward_mean": -400.0,
        }

        # Test for both torch and tf.
        for fw in framework_iterator(frameworks=["torch", "tf"]):
            trials = run_experiments(
                {
                    "test": {
                        "run": "PPO",
                        "env": "multi_agent_pendulum",
                        "stop": stop,
                        "config": {
                            "train_batch_size": 2048,
                            "vf_clip_param": 10.0,
                            "num_workers": 0,
                            "num_envs_per_worker": 10,
                            "lambda": 0.1,
                            "gamma": 0.95,
                            "lr": 0.0003,
                            "sgd_minibatch_size": 64,
                            "num_sgd_iter": 10,
                            "model": {
                                "fcnet_hiddens": [128, 128],
                            },
                            "batch_mode": "complete_episodes",
                            "framework": fw,
                        },
                    }
                },
                verbose=1)
            if trials[0].last_result["episode_reward_mean"] <= \
                    stop["episode_reward_mean"]:
                raise ValueError(
                    "Did not get to {} reward".format(
                        stop["episode_reward_mean"]), trials[0].last_result)


if __name__ == "__main__":
    import pytest
    import sys
    sys.exit(pytest.main(["-v", __file__]))
[rllib] Learner should not see clipped actions (#3496) 2018-12-09 21:57:11 -08:00			`"""Integration test: (1) pendulum works, (2) single-agent multi-agent works."""`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`import unittest`
[rllib] Learner should not see clipped actions (#3496) 2018-12-09 21:57:11 -08:00
			`import ray`
			`from ray.tune import run_experiments`
			`from ray.tune.registry import register_env`
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00			`from ray.rllib.examples.env.multi_agent import MultiAgentPendulum`
			`from ray.rllib.utils.test_utils import framework_iterator`
[rllib] Learner should not see clipped actions (#3496) 2018-12-09 21:57:11 -08:00
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00
			`class TestMultiAgentPendulum(unittest.TestCase):`
			`def setUp(self) -> None:`
			`ray.init()`

			`def tearDown(self) -> None:`
			`ray.shutdown()`

			`def test_multi_agent_pendulum(self):`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`register_env("multi_agent_pendulum",`
			`lambda _: MultiAgentPendulum({"num_agents": 1}))`
[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00			`stop = {`
			`"timesteps_total": 500000,`
			`"episode_reward_mean": -400.0,`
			`}`

[RLlib] Issue 8319 DDPG (MA or num_envs_per_worker > 1) broken. (#8324) 2020-05-08 08:26:32 +02:00			`# Test for both torch and tf.`
			`for fw in framework_iterator(frameworks=["torch", "tf"]):`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`trials = run_experiments(`
			`{`
			`"test": {`
			`"run": "PPO",`
			`"env": "multi_agent_pendulum",`
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00			`"stop": stop,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`"config": {`
			`"train_batch_size": 2048,`
			`"vf_clip_param": 10.0,`
			`"num_workers": 0,`
			`"num_envs_per_worker": 10,`
			`"lambda": 0.1,`
			`"gamma": 0.95,`
			`"lr": 0.0003,`
			`"sgd_minibatch_size": 64,`
			`"num_sgd_iter": 10,`
			`"model": {`
			`"fcnet_hiddens": [128, 128],`
			`},`
			`"batch_mode": "complete_episodes",`
			`"framework": fw,`
			`},`
			`}`
			`},`
			`verbose=1)`
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00			`if trials[0].last_result["episode_reward_mean"] <= \`
			`stop["episode_reward_mean"]:`
			`raise ValueError(`
			`"Did not get to {} reward".format(`
			`stop["episode_reward_mean"]), trials[0].last_result)`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00

			`if __name__ == "__main__":`
			`import pytest`
			`import sys`
			`sys.exit(pytest.main(["-v", __file__]))`