ray/rllib/tests/test_evaluators.py

import gym
import unittest

import ray
from ray.rllib.agents.dqn import DQNTrainer
from ray.rllib.agents.a3c import A3CTrainer
from ray.rllib.agents.dqn.dqn_tf_policy import _adjust_nstep
from ray.tune.registry import register_env


class EvalTest(unittest.TestCase):
    def test_dqn_n_step(self):
        obs = [1, 2, 3, 4, 5, 6, 7]
        actions = ["a", "b", "a", "a", "a", "b", "a"]
        rewards = [10.0, 0.0, 100.0, 100.0, 100.0, 100.0, 100.0]
        new_obs = [2, 3, 4, 5, 6, 7, 8]
        dones = [0, 0, 0, 0, 0, 0, 1]
        _adjust_nstep(3, 0.9, obs, actions, rewards, new_obs, dones)
        self.assertEqual(obs, [1, 2, 3, 4, 5, 6, 7])
        self.assertEqual(actions, ["a", "b", "a", "a", "a", "b", "a"])
        self.assertEqual(new_obs, [4, 5, 6, 7, 8, 8, 8])
        self.assertEqual(dones, [0, 0, 0, 0, 1, 1, 1])
        self.assertEqual(rewards,
                         [91.0, 171.0, 271.0, 271.0, 271.0, 190.0, 100.0])

    def test_evaluation_option(self):
        def env_creator(env_config):
            return gym.make("CartPole-v0")

        agent_classes = [A3CTrainer, DQNTrainer]

        for agent_cls in agent_classes:
            ray.init(object_store_memory=1000 * 1024 * 1024)
            register_env("CartPoleWrapped-v0", env_creator)
            agent = agent_cls(
                env="CartPoleWrapped-v0",
                config={
                    "evaluation_interval": 2,
                    "evaluation_num_episodes": 2,
                    "evaluation_config": {
                        "gamma": 0.98,
                        "env_config": {
                            "fake_arg": True
                        }
                    },
                })
            # Given evaluation_interval=2, r0, r2, r4 should not contain
            # evaluation metrics while r1, r3 should do.
            r0 = agent.train()
            r1 = agent.train()
            r2 = agent.train()
            r3 = agent.train()

            self.assertTrue("evaluation" in r1)
            self.assertTrue("evaluation" in r3)
            self.assertFalse("evaluation" in r0)
            self.assertFalse("evaluation" in r2)
            self.assertTrue("episode_reward_mean" in r1["evaluation"])
            self.assertNotEqual(r1["evaluation"], r3["evaluation"])
            ray.shutdown()


if __name__ == "__main__":
    import pytest
    import sys
    sys.exit(pytest.main(["-v", __file__]))
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`import gym`
[rllib] Evaluators and Optimizers Refactoring (#1339) 2017-12-30 00:24:54 -08:00			`import unittest`

[rllib] Add evaluation option to DQN agent (#3835) * add eval * interval * multiagent minor fix * Update rllib.rst * Update ddpg.py * Update qmix.py 2019-01-29 21:19:53 -08:00			`import ray`
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00			`from ray.rllib.agents.dqn import DQNTrainer`
[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00			`from ray.rllib.agents.a3c import A3CTrainer`
[RLlib] DQN torch version. (#7597) * Fix. * Rollback. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * Fix. * Fix. * Fix. * Fix. * WIP. * WIP. * Fix. * Test case fixes. * Test case fixes and LINT. * Test case fixes and LINT. * Rollback. * WIP. * WIP. * Test case fixes. * Fix. * Fix. * Fix. * Add regression test for DQN w/ param noise. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Fixes and LINT. * Comment * Regression test case. * WIP. * WIP. * LINT. * LINT. * WIP. * Fix. * Fix. * Fix. * LINT. * Fix (SAC does currently not support eager). * Fix. * WIP. * LINT. * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/evaluation/sampler.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/utils/exploration/exploration.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * WIP. * Fix. * LINT. * LINT. * Fix and LINT. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Fix. * Fix and LINT. * Update rllib/utils/exploration/exploration.py * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Update rllib/policy/dynamic_tf_policy.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Fixes. * WIP. * LINT. * Fixes and LINT. * LINT and fixes. * LINT. * Move action_dist back into torch extra_action_out_fn and LINT. * Working SimpleQ learning cartpole on both torch AND tf. * Working Rainbow learning cartpole on tf. * Working Rainbow learning cartpole on tf. * WIP. * LINT. * LINT. * Update docs and add torch to APEX test. * LINT. * Fix. * LINT. * Fix. * Fix. * Fix and docstrings. * Fix broken RLlib tests in master. * Split BAZEL learning tests into cartpole and pendulum (reached the 60min barrier). * Fix error_outputs option in BAZEL for RLlib regression tests. * Fix. * Tune param-noise tests. * LINT. * Fix. * Fix. * test * test * test * Fix. * Fix. * WIP. * WIP. * WIP. * WIP. * LINT. * WIP. Co-authored-by: Eric Liang <ekhliang@gmail.com> 2020-04-06 20:56:16 +02:00			`from ray.rllib.agents.dqn.dqn_tf_policy import _adjust_nstep`
[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00			`from ray.tune.registry import register_env`
[rllib] Evaluators and Optimizers Refactoring (#1339) 2017-12-30 00:24:54 -08:00

[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00			`class EvalTest(unittest.TestCase):`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_dqn_n_step(self):`
[rllib] Add n-step Q learning for DQN (#1439) * n-step * add sample adjustm * Oops * fix nstep * metric adjustment * Sat Jan 20 23:30:34 PST 2018 * Sun Jan 21 16:40:46 PST 2018 * Mon Jan 22 22:24:57 PST 2018 2018-01-23 10:31:19 -08:00			`obs = [1, 2, 3, 4, 5, 6, 7]`
			`actions = ["a", "b", "a", "a", "a", "b", "a"]`
[rllib] Fix edge case in n-step calculation and non-apex replay prioritization (#2929) * fix * lint 2018-09-28 15:22:33 -07:00			`rewards = [10.0, 0.0, 100.0, 100.0, 100.0, 100.0, 100.0]`
[rllib] Add n-step Q learning for DQN (#1439) * n-step * add sample adjustm * Oops * fix nstep * metric adjustment * Sat Jan 20 23:30:34 PST 2018 * Sun Jan 21 16:40:46 PST 2018 * Mon Jan 22 22:24:57 PST 2018 2018-01-23 10:31:19 -08:00			`new_obs = [2, 3, 4, 5, 6, 7, 8]`
[rllib] Fix edge case in n-step calculation and non-apex replay prioritization (#2929) * fix * lint 2018-09-28 15:22:33 -07:00			`dones = [0, 0, 0, 0, 0, 0, 1]`
[rllib] Better document which methods are abstract and which ones are overrides (#3480) 2018-12-08 16:28:58 -08:00			`_adjust_nstep(3, 0.9, obs, actions, rewards, new_obs, dones)`
[rllib] Fix edge case in n-step calculation and non-apex replay prioritization (#2929) * fix * lint 2018-09-28 15:22:33 -07:00			`self.assertEqual(obs, [1, 2, 3, 4, 5, 6, 7])`
			`self.assertEqual(actions, ["a", "b", "a", "a", "a", "b", "a"])`
			`self.assertEqual(new_obs, [4, 5, 6, 7, 8, 8, 8])`
			`self.assertEqual(dones, [0, 0, 0, 0, 1, 1, 1])`
			`self.assertEqual(rewards,`
			`[91.0, 171.0, 271.0, 271.0, 271.0, 190.0, 100.0])`
[rllib] Add n-step Q learning for DQN (#1439) * n-step * add sample adjustm * Oops * fix nstep * metric adjustment * Sat Jan 20 23:30:34 PST 2018 * Sun Jan 21 16:40:46 PST 2018 * Mon Jan 22 22:24:57 PST 2018 2018-01-23 10:31:19 -08:00
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_evaluation_option(self):`
[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00			`def env_creator(env_config):`
			`return gym.make("CartPole-v0")`

[rllib] Port DQN/Ape-X to training workflow api (#8077) 2020-04-23 12:39:19 -07:00			`agent_classes = [A3CTrainer, DQNTrainer]`
[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00
			`for agent_cls in agent_classes:`
[rllib] Enable object store memory limit by default (#5534) 2019-08-26 01:37:28 -07:00			`ray.init(object_store_memory=1000 * 1024 * 1024)`
[rllib] train-eval loop implementation for rllib.Trainer class (#4647) 2019-04-21 22:08:04 +03:00			`register_env("CartPoleWrapped-v0", env_creator)`
			`agent = agent_cls(`
			`env="CartPoleWrapped-v0",`
			`config={`
			`"evaluation_interval": 2,`
			`"evaluation_num_episodes": 2,`
			`"evaluation_config": {`
			`"gamma": 0.98,`
			`"env_config": {`
			`"fake_arg": True`
			`}`
			`},`
			`})`
			`# Given evaluation_interval=2, r0, r2, r4 should not contain`
			`# evaluation metrics while r1, r3 should do.`
			`r0 = agent.train()`
			`r1 = agent.train()`
			`r2 = agent.train()`
			`r3 = agent.train()`

			`self.assertTrue("evaluation" in r1)`
			`self.assertTrue("evaluation" in r3)`
			`self.assertFalse("evaluation" in r0)`
			`self.assertFalse("evaluation" in r2)`
			`self.assertTrue("episode_reward_mean" in r1["evaluation"])`
			`self.assertNotEqual(r1["evaluation"], r3["evaluation"])`
			`ray.shutdown()`
[rllib] Add evaluation option to DQN agent (#3835) * add eval * interval * multiagent minor fix * Update rllib.rst * Update ddpg.py * Update qmix.py 2019-01-29 21:19:53 -08:00
[rllib] Add n-step Q learning for DQN (#1439) * n-step * add sample adjustm * Oops * fix nstep * metric adjustment * Sat Jan 20 23:30:34 PST 2018 * Sun Jan 21 16:40:46 PST 2018 * Mon Jan 22 22:24:57 PST 2018 2018-01-23 10:31:19 -08:00
Replace '__main__' with "__main__". (#4055) 2019-02-15 13:32:43 -08:00			`if __name__ == "__main__":`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`import pytest`
			`import sys`
			`sys.exit(pytest.main(["-v", __file__]))`