ray/rllib/tests/test_eager_support.py

import unittest

import ray
from ray import tune
from ray.rllib.agents.registry import get_agent_class


def check_support(alg, config, test_trace=True):
    config["eager"] = True

    # Test both continuous and discrete actions.
    for cont in [True, False]:
        if cont and alg in ["DQN", "APEX", "SimpleQ"]:
            continue
        elif not cont and alg in ["DDPG", "APEX_DDPG", "TD3"]:
            continue

        print("run={} cont. actions={}".format(alg, cont))

        if cont:
            config["env"] = "Pendulum-v0"
        else:
            config["env"] = "CartPole-v0"

        a = get_agent_class(alg)
        config["log_level"] = "ERROR"

        config["eager_tracing"] = False
        tune.run(a, config=config, stop={"training_iteration": 1})

        if test_trace:
            config["eager_tracing"] = True
            tune.run(a, config=config, stop={"training_iteration": 1})


class TestEagerSupport(unittest.TestCase):
    def setUp(self):
        ray.init(num_cpus=4)

    def tearDown(self):
        ray.shutdown()

    def test_simple_q(self):
        check_support("SimpleQ", {"num_workers": 0, "learning_starts": 0})

    def test_dqn(self):
        check_support("DQN", {"num_workers": 0, "learning_starts": 0})

    # TODO(sven): Add these once DDPG supports eager.
    # def test_ddpg(self):
    #     check_support("DDPG", {"num_workers": 0})

    # def test_apex_ddpg(self):
    #     check_support("APEX_DDPG", {"num_workers": 1})

    # def test_td3(self):
    #     check_support("TD3", {"num_workers": 0})

    def test_a2c(self):
        check_support("A2C", {"num_workers": 0})

    def test_a3c(self):
        check_support("A3C", {"num_workers": 1})

    def test_pg(self):
        check_support("PG", {"num_workers": 0})

    def test_ppo(self):
        check_support("PPO", {"num_workers": 0})

    def test_appo(self):
        check_support("APPO", {"num_workers": 1, "num_gpus": 0})

    def test_impala(self):
        check_support("IMPALA", {"num_workers": 1, "num_gpus": 0})

    def test_apex_dqn(self):
        check_support(
            "APEX", {
                "num_workers": 2,
                "learning_starts": 0,
                "num_gpus": 0,
                "min_iter_time_s": 1,
                "timesteps_per_iteration": 100,
                "optimizer": {
                    "num_replay_buffer_shards": 1,
                },
            })

    # TODO(sven): Add this once SAC supports eager.
    # def test_sac(self):
    #    check_support("SAC", {"num_workers": 0, "learning_starts": 0})


if __name__ == "__main__":
    import pytest
    import sys
    sys.exit(pytest.main(["-v", __file__]))
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`import unittest`

			`import ray`
			`from ray import tune`
			`from ray.rllib.agents.registry import get_agent_class`


[rllib] Fix APEX priorities returning zero all the time (#5980) * fix * move example tests to end * level err * guard against none * no trace test * ignore thumbs * np * fix multi node * fix 2019-10-26 13:23:42 -07:00			`def check_support(alg, config, test_trace=True):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`config["eager"] = True`
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00
			`# Test both continuous and discrete actions.`
			`for cont in [True, False]:`
			`if cont and alg in ["DQN", "APEX", "SimpleQ"]:`
			`continue`
			`elif not cont and alg in ["DDPG", "APEX_DDPG", "TD3"]:`
			`continue`

			`print("run={} cont. actions={}".format(alg, cont))`

			`if cont:`
			`config["env"] = "Pendulum-v0"`
			`else:`
			`config["env"] = "CartPole-v0"`

			`a = get_agent_class(alg)`
			`config["log_level"] = "ERROR"`

			`config["eager_tracing"] = False`
[RLlib] SAC add discrete action support. (#7320) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * update. * WIP. * Gumbel Softmax Dist. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP * WIP. * WIP. * Hypertune. * Hypertune. * Hypertune. * Lock-in. * Cleanup. * LINT. * Fix. * Update rllib/policy/eager_tf_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/agents/sac/sac_policy.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Update rllib/models/tf/tf_action_dist.py Co-Authored-By: Kristian Hartikainen <kristian.hartikainen@gmail.com> * Fix items from review comments. * Add dm_tree to RLlib dependencies. * Add dm_tree to RLlib dependencies. * Fix DQN test cases ((Torch)Categorical). * Fix wrong pip install. Co-authored-by: Eric Liang <ekhliang@gmail.com> Co-authored-by: Kristian Hartikainen <kristian.hartikainen@gmail.com> 2020-03-06 19:37:12 +01:00			`tune.run(a, config=config, stop={"training_iteration": 1})`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`if test_trace:`
			`config["eager_tracing"] = True`
			`tune.run(a, config=config, stop={"training_iteration": 1})`

[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00
			`class TestEagerSupport(unittest.TestCase):`
			`def setUp(self):`
			`ray.init(num_cpus=4)`

			`def tearDown(self):`
			`ray.shutdown()`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_simple_q(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("SimpleQ", {"num_workers": 0, "learning_starts": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_dqn(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("DQN", {"num_workers": 0, "learning_starts": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`# TODO(sven): Add these once DDPG supports eager.`
			`# def test_ddpg(self):`
			`# check_support("DDPG", {"num_workers": 0})`

			`# def test_apex_ddpg(self):`
			`# check_support("APEX_DDPG", {"num_workers": 1})`

			`# def test_td3(self):`
			`# check_support("TD3", {"num_workers": 0})`

			`def test_a2c(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("A2C", {"num_workers": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_a3c(self):`
[RLlib] Fix broken example: tf-eager with custom-RNN (#6732). (#7021) * WIP. * Fix float32 conversion in OneHot preprocessor (would cause float64 in eager, then NN-matmul-failure). Add proper seq-len + state-in construction in eager_tf_policy.py::_compute_gradients(). * LINT. * eager_tf_policy.py: Only set samples["seq_lens"] if RNN. Otherwise, eager-tracing will throw flattened-dict key-mismatch error. * Move issue code to examples folder. Co-authored-by: Eric Liang <ekhliang@gmail.com> 2020-02-06 18:44:08 +01:00			`check_support("A3C", {"num_workers": 1})`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00
[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_pg(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("PG", {"num_workers": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_ppo(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("PPO", {"num_workers": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_appo(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("APPO", {"num_workers": 1, "num_gpus": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_impala(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support("IMPALA", {"num_workers": 1, "num_gpus": 0})`

[RLlib] Cleanup/unify all test cases. (#7533) 2020-03-12 04:39:47 +01:00			`def test_apex_dqn(self):`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`check_support(`
			`"APEX", {`
			`"num_workers": 2,`
			`"learning_starts": 0,`
			`"num_gpus": 0,`
			`"min_iter_time_s": 1,`
[rllib] Fix torch GPU / yaml load warning (#7278) * fix * safe load * reduce num buffer shardscZZ 2020-02-23 13:13:43 -08:00			`"timesteps_per_iteration": 100,`
			`"optimizer": {`
			`"num_replay_buffer_shards": 1,`
			`},`
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00			`})`

[RLlib] Minimal ParamNoise PR. (#7772) 2020-03-29 00:16:30 +01:00			`# TODO(sven): Add this once SAC supports eager.`
			`# def test_sac(self):`
			`# check_support("SAC", {"num_workers": 0, "learning_starts": 0})`
[RLlib] Policy.compute_log_likelihoods() and SAC refactor. (issue #7107) (#7124) * Exploration API (+EpsilonGreedy sub-class). * Exploration API (+EpsilonGreedy sub-class). * Cleanup/LINT. * Add `deterministic` to generic Trainer config (NOTE: this is still ignored by most Agents). * Add `error` option to deprecation_warning(). * WIP. * Bug fix: Get exploration-info for tf framework. Bug fix: Properly deprecate some DQN config keys. * WIP. * LINT. * WIP. * Split PerWorkerEpsilonGreedy out of EpsilonGreedy. Docstrings. * Fix bug in sampler.py in case Policy has self.exploration = None * Update rllib/agents/dqn/dqn.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Update rllib/agents/trainer.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * WIP. * Change requests. * LINT * In tune/utils/util.py::deep_update() Only keep deep_updat'ing if both original and value are dicts. If value is not a dict, set * Completely obsolete syn_replay_optimizer.py's parameters schedule_max_timesteps AND beta_annealing_fraction (replaced with prioritized_replay_beta_annealing_timesteps). * Update rllib/evaluation/worker_set.py Co-Authored-By: Eric Liang <ekhliang@gmail.com> * Review fixes. * Fix default value for DQN's exploration spec. * LINT * Fix recursion bug (wrong parent c'tor). * Do not pass timestep to get_exploration_info. * Update tf_policy.py * Fix some remaining issues with test cases and remove more deprecated DQN/APEX exploration configs. * Bug fix tf-action-dist * DDPG incompatibility bug fix with new DQN exploration handling (which is imported by DDPG). * Switch off exploration when getting action probs from off-policy-estimator's policy. * LINT * Fix test_checkpoint_restore.py. * Deprecate all SAC exploration (unused) configs. * Properly use `model.last_output()` everywhere. Instead of `model._last_output`. * WIP. * Take out set_epsilon from multi-agent-env test (not needed, decays anyway). * WIP. * Trigger re-test (flaky checkpoint-restore test). * WIP. * WIP. * Add test case for deterministic action sampling in PPO. * bug fix. * Added deterministic test cases for different Agents. * Fix problem with TupleActions in dynamic-tf-policy. * Separate supported_spaces tests so they can be run separately for easier debugging. * LINT. * Fix autoregressive_action_dist.py test case. * Re-test. * Fix. * Remove duplicate py_test rule from bazel. * LINT. * WIP. * WIP. * SAC fix. * SAC fix. * WIP. * WIP. * WIP. * FIX 2 examples tests. * WIP. * WIP. * WIP. * WIP. * WIP. * Fix. * LINT. * Renamed test file. * WIP. * Add unittest.main. * Make action_dist_class mandatory. * fix * FIX. * WIP. * WIP. * Fix. * Fix. * Fix explorations test case (contextlib cannot find its own nullcontext??). * Force torch to be installed for QMIX. * LINT. * Fix determine_tests_to_run.py. * Fix determine_tests_to_run.py. * WIP * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Add Random exploration component to tests (fixed issue with "static-graph randomness" via py_function). * Rename some stuff. * Rename some stuff. * WIP. * WIP. * Fix SAC. * Fix SAC. * Fix strange tf-error in ray core tests. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix test_io.py. * LINT. * Update SAC yaml files' config. Co-authored-by: Eric Liang <ekhliang@gmail.com> 2020-02-22 23:19:49 +01:00
[rllib] Adds eager support with a generic `TFEagerPolicy` class (#5436) 2019-08-23 02:21:11 -04:00
			`if __name__ == "__main__":`
Enable direct calls by default (#6367) * wip * add * timeout fix * const ref * comments * fix * fix * Move actor state into actor handle * comments 2 * enable by default * temp reorder * some fixes * add debug code * tmp * fix * wip * remove dbg * fix compile * fix * fix check * remove non direct tests * Increment ref count before resolving value * rename * fix another bug * tmp * tmp * Fix object pinning * build change * lint * ActorManager * tmp * ActorManager * fix test component failures * Remove old code * Remove unused * fix * fix * fix resources * fix advanced * eric's diff * blacklist * blacklist * cleanup * annotate * disable tests for now * remove * fix * fix * clean up verbosity * fix test * fix concurrency test * Update .travis.yml * Update .travis.yml * Update .travis.yml * split up analysis suite * split up trial runner suite * fix detached direct actors * fix * split up advanced tesT * lint * fix core worker test hang * fix bad check fail which breaks test_cluster.py in tune * fix some minor diffs in test_cluster * less workers * make less stressful * split up test * retry flaky tests * remove old test flags * fixes * lint * Update worker_pool.cc * fix race * fix * fix bugs in node failure handling * fix race condition * fix bugs in node failure handling * fix race condition * nits * fix test * disable heartbeatS * disable heartbeatS * fix * fix * use worker id * fix max fail * debug exit * fix merge, and apply [PATCH] fix concurrency test * [patch] fix core worker test hang * remove NotifyActorCreation, and return worker on completion of actor creation task * remove actor diied callback * Update core_worker.cc * lint * use task manager * fix merge * fix deadlock * wip * merge conflits * fix * better sysexit handling * better sysexit handling * better sysexit handling * check id * better debug * task failed msg * task failed msg * retry failed tasks with delay * retry failed tasks with delay * clip deps * fix * fix core worker tests * fix task manager test * fix all tests * cleanup * set to 0 for direct tests * dont check worker id for ownership rpc * dont check worker id for ownership rpc * debug messages * add comment * remove debug statements * nit * check worker id * fix test * owner * fix tests 2019-12-13 13:58:04 -08:00			`import pytest`
			`import sys`
			`sys.exit(pytest.main(["-v", __file__]))`