ray/rllib/tests/test_reproducibility.py

import gym
import numpy as np
import unittest

import ray
from ray.rllib.algorithms.dqn import DQNTrainer
from ray.rllib.utils.test_utils import framework_iterator
from ray.tune.registry import register_env


class TestReproducibility(unittest.TestCase):
    def test_reproducing_trajectory(self):
        class PickLargest(gym.Env):
            def __init__(self):
                self.observation_space = gym.spaces.Box(
                    low=float("-inf"), high=float("inf"), shape=(4,)
                )
                self.action_space = gym.spaces.Discrete(4)

            def reset(self, **kwargs):
                self.obs = np.random.randn(4)
                return self.obs

            def step(self, action):
                reward = self.obs[action]
                return self.obs, reward, True, {}

        def env_creator(env_config):
            return PickLargest()

        for fw in framework_iterator(frameworks=("tf", "torch")):
            trajs = list()
            for trial in range(3):
                ray.init()
                register_env("PickLargest", env_creator)
                config = {
                    "seed": 666 if trial in [0, 1] else 999,
                    "min_time_s_per_reporting": 0,
                    "min_sample_timesteps_per_reporting": 100,
                    "framework": fw,
                }
                agent = DQNTrainer(config=config, env="PickLargest")

                trajectory = list()
                for _ in range(8):
                    r = agent.train()
                    trajectory.append(r["episode_reward_max"])
                    trajectory.append(r["episode_reward_min"])
                trajs.append(trajectory)

                ray.shutdown()

            # trial0 and trial1 use same seed and thus
            # expect identical trajectories.
            all_same = True
            for v0, v1 in zip(trajs[0], trajs[1]):
                if v0 != v1:
                    all_same = False
            self.assertTrue(all_same)

            # trial1 and trial2 use different seeds and thus
            # most rewards tend to be different.
            diff_cnt = 0
            for v1, v2 in zip(trajs[1], trajs[2]):
                if v1 != v2:
                    diff_cnt += 1
            self.assertTrue(diff_cnt > 8)


if __name__ == "__main__":
    import pytest
    import sys

    sys.exit(pytest.main(["-v", __file__]))
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`import gym`
			`import numpy as np`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00			`import unittest`

			`import ray`
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896) 2022-05-19 09:30:42 -07:00			`from ray.rllib.algorithms.dqn import DQNTrainer`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`from ray.rllib.utils.test_utils import framework_iterator`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00			`from ray.tune.registry import register_env`


			`class TestReproducibility(unittest.TestCase):`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_reproducing_trajectory(self):`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00			`class PickLargest(gym.Env):`
			`def __init__(self):`
			`self.observation_space = gym.spaces.Box(`
			`low=float("-inf"), high=float("inf"), shape=(4,)`
			`)`
			`self.action_space = gym.spaces.Discrete(4)`

			`def reset(self, **kwargs):`
			`self.obs = np.random.randn(4)`
			`return self.obs`

			`def step(self, action):`
			`reward = self.obs[action]`
			`return self.obs, reward, True, {}`

			`def env_creator(env_config):`
			`return PickLargest()`

[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`for fw in framework_iterator(frameworks=("tf", "torch")):`
			`trajs = list()`
			`for trial in range(3):`
			`ray.init()`
			`register_env("PickLargest", env_creator)`
			`config = {`
[rllib] Port DQN/Ape-X to training workflow api (#8077) 2020-04-23 12:39:19 -07:00			`"seed": 666 if trial in [0, 1] else 999,`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`"min_time_s_per_reporting": 0,`
[RLlib] Deprecate `timesteps_per_iteration` config key (in favor of `min_[sample\|train]_timesteps_per_reporting`. (#24372) 2022-05-02 12:51:14 +02:00			`"min_sample_timesteps_per_reporting": 100,`
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`"framework": fw,`
			`}`
			`agent = DQNTrainer(config=config, env="PickLargest")`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`trajectory = list()`
			`for _ in range(8):`
			`r = agent.train()`
			`trajectory.append(r["episode_reward_max"])`
			`trajectory.append(r["episode_reward_min"])`
			`trajs.append(trajectory)`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`ray.shutdown()`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`# trial0 and trial1 use same seed and thus`
			`# expect identical trajectories.`
			`all_same = True`
			`for v0, v1 in zip(trajs[0], trajs[1]):`
			`if v0 != v1:`
			`all_same = False`
			`self.assertTrue(all_same)`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00
[RLlib] Auto-framework, retire `use_pytorch` in favor of `framework=...` (#8520) 2020-05-27 16:19:13 +02:00			`# trial1 and trial2 use different seeds and thus`
			`# most rewards tend to be different.`
			`diff_cnt = 0`
			`for v1, v2 in zip(trajs[1], trajs[2]):`
			`if v1 != v2:`
			`diff_cnt += 1`
			`self.assertTrue(diff_cnt > 8)`
Enable seeding actors for reproducible experiments (#5197) * enable graph-level worker-specific seed * lint checked * revised according to eric's suggestions * revised accordingly and added a test case * formated * Update test_reproducibility.py * Update trainer.py * Update rollout_worker.py * Update run_rllib_tests.sh * Update worker_set.py 2019-07-18 14:31:34 +08:00

			`if __name__ == "__main__":`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`import pytest`
			`import sys`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`sys.exit(pytest.main(["-v", __file__]))`