ray/rllib/agents/impala/tests/test_impala.py

import unittest

import ray
import ray.rllib.agents.impala as impala
from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID
from ray.rllib.utils.framework import try_import_tf
from ray.rllib.utils.metrics.learner_info import LEARNER_INFO, LEARNER_STATS_KEY
from ray.rllib.utils.test_utils import (
    check,
    check_compute_single_action,
    check_train_results,
    framework_iterator,
)

tf1, tf, tfv = try_import_tf()


class TestIMPALA(unittest.TestCase):
    @classmethod
    def setUpClass(cls) -> None:
        ray.init()

    @classmethod
    def tearDownClass(cls) -> None:
        ray.shutdown()

    def test_impala_compilation(self):
        """Test whether an ImpalaTrainer can be built with both frameworks."""
        config = (
            impala.ImpalaConfig()
            .resources(num_gpus=0)
            .training(
                model={
                    "lstm_use_prev_action": True,
                    "lstm_use_prev_reward": True,
                }
            )
        )

        num_iterations = 1
        env = "CartPole-v0"

        for _ in framework_iterator(config, with_eager_tracing=True):
            for lstm in [False, True]:
                config.num_aggregation_workers = 0 if not lstm else 1
                config.model["use_lstm"] = lstm
                print(
                    "lstm={} aggregation-workers={}".format(
                        lstm, config.num_aggregation_workers
                    )
                )
                # Test with and w/o aggregation workers (this has nothing
                # to do with LSTMs, though).
                trainer = config.build(env=env)
                for i in range(num_iterations):
                    results = trainer.train()
                    check_train_results(results)
                    print(results)

                check_compute_single_action(
                    trainer,
                    include_state=lstm,
                    include_prev_action_reward=lstm,
                )
                trainer.stop()

    def test_impala_lr_schedule(self):
        # Test whether we correctly ignore the "lr" setting.
        # The first lr should be 0.05.
        config = (
            impala.ImpalaConfig()
            .resources(num_gpus=0)
            .training(
                lr=0.1,
                lr_schedule=[
                    [0, 0.05],
                    [10000, 0.000001],
                ],
            )
        )
        config.environment(env="CartPole-v0")

        def get_lr(result):
            return result["info"][LEARNER_INFO][DEFAULT_POLICY_ID][LEARNER_STATS_KEY][
                "cur_lr"
            ]

        for fw in framework_iterator(config):
            trainer = config.build()
            policy = trainer.get_policy()

            try:
                if fw == "tf":
                    check(policy.get_session().run(policy.cur_lr), 0.05)
                else:
                    check(policy.cur_lr, 0.05)
                r1 = trainer.train()
                r2 = trainer.train()
                r3 = trainer.train()
                # Due to the asynch'ness of IMPALA, learner-stats metrics
                # could be delayed by one iteration. Do 3 train() calls here
                # and measure guaranteed decrease in lr between 1st and 3rd.
                lr1 = get_lr(r1)
                lr2 = get_lr(r2)
                lr3 = get_lr(r3)
                assert lr2 <= lr1, (lr1, lr2)
                assert lr3 <= lr2, (lr2, lr3)
                assert lr3 < lr1, (lr1, lr3)
            finally:
                trainer.stop()


if __name__ == "__main__":
    import pytest
    import sys

    sys.exit(pytest.main(["-v", __file__]))
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`import unittest`

			`import ray`
			`import ray.rllib.agents.impala as impala`
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064) 2020-12-27 09:46:03 -05:00			`from ray.rllib.policy.sample_batch import DEFAULT_POLICY_ID`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`from ray.rllib.utils.framework import try_import_tf`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`from ray.rllib.utils.metrics.learner_info import LEARNER_INFO, LEARNER_STATS_KEY`
			`from ray.rllib.utils.test_utils import (`
			`check,`
			`check_compute_single_action,`
			`check_train_results,`
			`framework_iterator,`
			`)`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00
[RLlib] Tf2x preparation; part 2 (upgrading `try_import_tf()`). (#9136) * WIP. * Fixes. * LINT. * WIP. * WIP. * Fixes. * Fixes. * Fixes. * Fixes. * WIP. * Fixes. * Test * Fix. * Fixes and LINT. * Fixes and LINT. * LINT. 2020-06-30 10:13:20 +02:00			`tf1, tf, tfv = try_import_tf()`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00

			`class TestIMPALA(unittest.TestCase):`
			`@classmethod`
This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970) 2020-06-27 20:50:01 +02:00			`def setUpClass(cls) -> None:`
			`ray.init()`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00
			`@classmethod`
This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970) 2020-06-27 20:50:01 +02:00			`def tearDownClass(cls) -> None:`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`ray.shutdown()`

			`def test_impala_compilation(self):`
			`"""Test whether an ImpalaTrainer can be built with both frameworks."""`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`config = (`
			`impala.ImpalaConfig()`
			`.resources(num_gpus=0)`
			`.training(`
			`model={`
			`"lstm_use_prev_action": True,`
			`"lstm_use_prev_reward": True,`
			`}`
			`)`
			`)`

[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`num_iterations = 1`
[RLlib] Issue #13802: Enhance metrics for `multiagent->count_steps_by=agent_steps` setting. (#14033) 2021-03-18 20:27:41 +01:00			`env = "CartPole-v0"`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. (#19981) 2021-11-05 16:10:00 +01:00			`for _ in framework_iterator(config, with_eager_tracing=True):`
[RLlib] Issue #13802: Enhance metrics for `multiagent->count_steps_by=agent_steps` setting. (#14033) 2021-03-18 20:27:41 +01:00			`for lstm in [False, True]:`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`config.num_aggregation_workers = 0 if not lstm else 1`
			`config.model["use_lstm"] = lstm`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`print(`
			`"lstm={} aggregation-workers={}".format(`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`lstm, config.num_aggregation_workers`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`)`
			`)`
[RLlib] Issue #13802: Enhance metrics for `multiagent->count_steps_by=agent_steps` setting. (#14033) 2021-03-18 20:27:41 +01:00			`# Test with and w/o aggregation workers (this has nothing`
			`# to do with LSTMs, though).`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`trainer = config.build(env=env)`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`for i in range(num_iterations):`
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. (#18879) 2021-09-30 16:39:05 +02:00			`results = trainer.train()`
			`check_train_results(results)`
			`print(results)`

This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970) 2020-06-27 20:50:01 +02:00			`check_compute_single_action(`
			`trainer,`
[RLlib] Issue #13802: Enhance metrics for `multiagent->count_steps_by=agent_steps` setting. (#14033) 2021-03-18 20:27:41 +01:00			`include_state=lstm,`
			`include_prev_action_reward=lstm,`
			`)`
[rllib] Distributed exec workflow for impala (#8321) 2020-05-11 20:24:43 -07:00			`trainer.stop()`
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00			`def test_impala_lr_schedule(self):`
[RLlib] Discussion 1928: Initial lr wrong if schedule used that includes ts=0 (both tf and torch). (#15538) 2021-04-27 17:19:52 +02:00			`# Test whether we correctly ignore the "lr" setting.`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`# The first lr should be 0.05.`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`config = (`
			`impala.ImpalaConfig()`
			`.resources(num_gpus=0)`
			`.training(`
			`lr=0.1,`
			`lr_schedule=[`
			`[0, 0.05],`
			`[10000, 0.000001],`
			`],`
			`)`
			`)`
			`config.environment(env="CartPole-v0")`
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
			`def get_lr(result):`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`return result["info"][LEARNER_INFO][DEFAULT_POLICY_ID][LEARNER_STATS_KEY][`
			`"cur_lr"`
			`]`
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`for fw in framework_iterator(config):`
[RLlib] IMPALA TrainerConfig objects. (#24375) 2022-05-02 12:05:30 +02:00			`trainer = config.build()`
[RLlib] Discussion 1928: Initial lr wrong if schedule used that includes ts=0 (both tf and torch). (#15538) 2021-04-27 17:19:52 +02:00			`policy = trainer.get_policy()`

			`try:`
			`if fw == "tf":`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`check(policy.get_session().run(policy.cur_lr), 0.05)`
[RLlib] Discussion 1928: Initial lr wrong if schedule used that includes ts=0 (both tf and torch). (#15538) 2021-04-27 17:19:52 +02:00			`else:`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`check(policy.cur_lr, 0.05)`
[RLlib] Discussion 1928: Initial lr wrong if schedule used that includes ts=0 (both tf and torch). (#15538) 2021-04-27 17:19:52 +02:00			`r1 = trainer.train()`
			`r2 = trainer.train()`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`r3 = trainer.train()`
			`# Due to the asynch'ness of IMPALA, learner-stats metrics`
			`# could be delayed by one iteration. Do 3 train() calls here`
			`# and measure guaranteed decrease in lr between 1st and 3rd.`
			`lr1 = get_lr(r1)`
			`lr2 = get_lr(r2)`
			`lr3 = get_lr(r3)`
			`assert lr2 <= lr1, (lr1, lr2)`
			`assert lr3 <= lr2, (lr2, lr3)`
			`assert lr3 < lr1, (lr1, lr3)`
[RLlib] Discussion 1928: Initial lr wrong if schedule used that includes ts=0 (both tf and torch). (#15538) 2021-04-27 17:19:52 +02:00			`finally:`
			`trainer.stop()`
[RLlib] Minor fixes (torch GPU bugs + some cleanup). (#11609) 2020-10-27 10:00:24 +01:00
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00
			`if __name__ == "__main__":`
			`import pytest`
			`import sys`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00
[RLlib] IMPALA PyTorch (#8287) This PR adds an IMPALA PyTorch implementation. - adds compilation tests for LSTM and w/o LSTM. - adds learning test for CartPole. 2020-05-03 13:44:25 +02:00			`sys.exit(pytest.main(["-v", __file__]))`