ray/rllib/agents/ddpg/apex.py

from ray.rllib.agents.dqn.apex import ApexTrainer
from ray.rllib.agents.ddpg.ddpg import DDPGTrainer, \
    DEFAULT_CONFIG as DDPG_CONFIG
from ray.rllib.evaluation.worker_set import WorkerSet
from ray.rllib.utils.annotations import override
from ray.rllib.utils.typing import TrainerConfigDict
from ray.util.iter import LocalIterator

APEX_DDPG_DEFAULT_CONFIG = DDPGTrainer.merge_trainer_configs(
    DDPG_CONFIG,  # see also the options in ddpg.py, which are also supported
    {
        "optimizer": {
            "max_weight_sync_delay": 400,
            "num_replay_buffer_shards": 4,
            "debug": False
        },
        "exploration_config": {
            "type": "PerWorkerOrnsteinUhlenbeckNoise"
        },
        "n_step": 3,
        "num_gpus": 0,
        "num_workers": 32,
        "buffer_size": 2000000,
        # TODO(jungong) : update once Apex supports replay_buffer_config.
        "replay_buffer_config": None,
        # Whether all shards of the replay buffer must be co-located
        # with the learner process (running the execution plan).
        # This is preferred b/c the learner process should have quick
        # access to the data from the buffer shards, avoiding network
        # traffic each time samples from the buffer(s) are drawn.
        # Set this to False for relaxing this constraint and allowing
        # replay shards to be created on node(s) other than the one
        # on which the learner is located.
        "replay_buffer_shards_colocated_with_driver": True,
        "learning_starts": 50000,
        "train_batch_size": 512,
        "rollout_fragment_length": 50,
        "target_network_update_freq": 500000,
        "timesteps_per_iteration": 25000,
        "worker_side_prioritization": True,
        "min_time_s_per_reporting": 30,
    },
    _allow_unknown_configs=True,
)


class ApexDDPGTrainer(DDPGTrainer):
    @classmethod
    @override(DDPGTrainer)
    def get_default_config(cls) -> TrainerConfigDict:
        return APEX_DDPG_DEFAULT_CONFIG

    @staticmethod
    @override(DDPGTrainer)
    def execution_plan(workers: WorkerSet, config: dict,
                       **kwargs) -> LocalIterator[dict]:
        """Use APEX-DQN's execution plan."""
        return ApexTrainer.execution_plan(workers, config, **kwargs)
[RLlib] Trainer sub-class DQN/SimpleQ/APEX-DQN/R2D2 (instead of using `build_trainer`). (#20633) 2021-11-30 18:05:44 +01:00			`from ray.rllib.agents.dqn.apex import ApexTrainer`
[rllib] Rename Agent to Trainer (#4556) 2019-04-07 00:36:18 -07:00			`from ray.rllib.agents.ddpg.ddpg import DDPGTrainer, \`
			`DEFAULT_CONFIG as DDPG_CONFIG`
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of `build_trainer`). (#20636) 2021-12-01 10:52:12 +01:00			`from ray.rllib.evaluation.worker_set import WorkerSet`
			`from ray.rllib.utils.annotations import override`
			`from ray.rllib.utils.typing import TrainerConfigDict`
			`from ray.util.iter import LocalIterator`
[rllib] Contribute DDPG to RLlib (#1877) * ongoing ddpg * ongoing ddpg converged * gpu machine changes * tuned * tuned ddpg specification * ddpg * supplement missed optimizer argument clip_rewards in default DQN configuration * ddpg supports vision env (atari) now * revised according to code review comments * added regression test case * removed irrelevant files * validate ddpg on mountain_car_continuous * restore unnecessary slight changes * revised according to eric's comments * added the requested tests * revised accordingly * revised accordingly and re-validated * formatted by yapf * fix lint errors * formatted by yapf * fix lint errors * formatted by yapf * fix lint error 2018-04-19 22:36:29 -07:00
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`APEX_DDPG_DEFAULT_CONFIG = DDPGTrainer.merge_trainer_configs(`
[rllib] Include config dicts in the sphinx docs (#3064) 2018-10-16 15:55:11 -07:00			`DDPG_CONFIG, # see also the options in ddpg.py, which are also supported`
Use flake8-comprehensions (#1976) * Add flake8 to Travis * Add flake8-comprehensions [flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that checks for useless constructions. * Use generators instead of lists where appropriate A lot of the builtins can take in generators instead of lists. This commit applies `flake8-comprehensions` to find them. * Fix lint error * Fix some string formatting The rest can be fixed in another PR * Fix compound literals syntax This should probably be merged after #1963. * dict() -> {} * Use dict literal syntax dict(...) -> {...} * Rewrite nested dicts * Fix hanging indent * Add missing import * Add missing quote * fmt * Add missing whitespace * rm duplicate pip install This is already installed in another file. * Fix indent * move `merge_dicts` into utils * Bring up to date with `master` * Add automatic syntax upgrade * rm pyupgrade In case users want to still use it on their own, the upgrade-syn.sh script was left in the `.travis` dir. 2018-05-20 16:15:06 -07:00			`{`
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314) * WIP. * WIP. * WIP. * WIP. * WIP. * Fix * WIP. * Add TD3 quick Pendulum regresison. * Cleanup. * Fix. * LINT. * Fix. * Sort quick_learning test cases, add TD3. * Sort quick_learning test cases, add TD3. * Revert test_checkpoint_restore.py (debugging) changes. * Fix old soft_q settings in documentation and test configs. * More doc fixes. * Fix test case. * Fix test case. * Lower test load. * WIP. 2020-03-01 20:53:35 +01:00			`"optimizer": {`
			`"max_weight_sync_delay": 400,`
			`"num_replay_buffer_shards": 4,`
			`"debug": False`
			`},`
			`"exploration_config": {`
			`"type": "PerWorkerOrnsteinUhlenbeckNoise"`
			`},`
[rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00			`"n_step": 3,`
[rllib] Clean up agent resource configurations (#3296) Closes #3284 2018-11-13 18:00:03 -08:00			`"num_gpus": 0,`
[rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00			`"num_workers": 32,`
			`"buffer_size": 2000000,`
[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809) * Fix DDPG, since it is based on GenericOffPolicyTrainer. * Fix QMix, SAC, and MADDPA too. * Undo QMix change. * Fix DQN input batch type. Always use SampleBatch. * apex ddpg should not use replay_buffer_config yet. * Make eager tf policy to use SampleBatch. * lint * LINT. * Re-enable RLlib broken tests to make sure things work ok now. * fixes. Co-authored-by: sven1977 <svenmika1977@gmail.com> 2021-10-28 09:06:47 -07:00			`# TODO(jungong) : update once Apex supports replay_buffer_config.`
			`"replay_buffer_config": None,`
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00			`# Whether all shards of the replay buffer must be co-located`
			`# with the learner process (running the execution plan).`
			`# This is preferred b/c the learner process should have quick`
			`# access to the data from the buffer shards, avoiding network`
			`# traffic each time samples from the buffer(s) are drawn.`
			`# Set this to False for relaxing this constraint and allowing`
			`# replay shards to be created on node(s) other than the one`
			`# on which the learner is located.`
			`"replay_buffer_shards_colocated_with_driver": True,`
[rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00			`"learning_starts": 50000,`
			`"train_batch_size": 512,`
[rllib] Rename sample_batch_size => rollout_fragment_length (#7503) * bulk rename * deprecation warn * update doc * update fig * line length * rename * make pytest comptaible * fix test * fi sys * rename * wip * fix more * lint * update svg * comments * lint * fix use of batch steps 2020-03-14 12:05:04 -07:00			`"rollout_fragment_length": 50,`
[rllib] Refactor rllib to have a common sample collection pathway (#2149) 2018-06-09 00:21:35 -07:00			`"target_network_update_freq": 500000,`
			`"timesteps_per_iteration": 25000,`
			`"worker_side_prioritization": True,`
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652) 2022-01-25 14:16:58 +01:00			`"min_time_s_per_reporting": 30,`
Use flake8-comprehensions (#1976) * Add flake8 to Travis * Add flake8-comprehensions [flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that checks for useless constructions. * Use generators instead of lists where appropriate A lot of the builtins can take in generators instead of lists. This commit applies `flake8-comprehensions` to find them. * Fix lint error * Fix some string formatting The rest can be fixed in another PR * Fix compound literals syntax This should probably be merged after #1963. * dict() -> {} * Use dict literal syntax dict(...) -> {...} * Rewrite nested dicts * Fix hanging indent * Add missing import * Add missing quote * fmt * Add missing whitespace * rm duplicate pip install This is already installed in another file. * Fix indent * move `merge_dicts` into utils * Bring up to date with `master` * Add automatic syntax upgrade * rm pyupgrade In case users want to still use it on their own, the upgrade-syn.sh script was left in the `.travis` dir. 2018-05-20 16:15:06 -07:00			`},`
[RLlib] Decentralized multi-agent learning; PR #01 (#21421) 2022-01-13 10:52:55 +01:00			`_allow_unknown_configs=True,`
Use flake8-comprehensions (#1976) * Add flake8 to Travis * Add flake8-comprehensions [flake8 plugin](https://github.com/adamchainz/flake8-comprehensions) that checks for useless constructions. * Use generators instead of lists where appropriate A lot of the builtins can take in generators instead of lists. This commit applies `flake8-comprehensions` to find them. * Fix lint error * Fix some string formatting The rest can be fixed in another PR * Fix compound literals syntax This should probably be merged after #1963. * dict() -> {} * Use dict literal syntax dict(...) -> {...} * Rewrite nested dicts * Fix hanging indent * Add missing import * Add missing quote * fmt * Add missing whitespace * rm duplicate pip install This is already installed in another file. * Fix indent * move `merge_dicts` into utils * Bring up to date with `master` * Add automatic syntax upgrade * rm pyupgrade In case users want to still use it on their own, the upgrade-syn.sh script was left in the `.travis` dir. 2018-05-20 16:15:06 -07:00			`)`
[rllib] Contribute DDPG to RLlib (#1877) * ongoing ddpg * ongoing ddpg converged * gpu machine changes * tuned * tuned ddpg specification * ddpg * supplement missed optimizer argument clip_rewards in default DQN configuration * ddpg supports vision env (atari) now * revised according to code review comments * added regression test case * removed irrelevant files * validate ddpg on mountain_car_continuous * restore unnecessary slight changes * revised according to eric's comments * added the requested tests * revised accordingly * revised accordingly and re-validated * formatted by yapf * fix lint errors * formatted by yapf * fix lint errors * formatted by yapf * fix lint error 2018-04-19 22:36:29 -07:00
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of `build_trainer`). (#20636) 2021-12-01 10:52:12 +01:00
			`class ApexDDPGTrainer(DDPGTrainer):`
			`@classmethod`
			`@override(DDPGTrainer)`
			`def get_default_config(cls) -> TrainerConfigDict:`
			`return APEX_DDPG_DEFAULT_CONFIG`

			`@staticmethod`
			`@override(DDPGTrainer)`
			`def execution_plan(workers: WorkerSet, config: dict,`
			`**kwargs) -> LocalIterator[dict]:`
			`"""Use APEX-DQN's execution plan."""`
			`return ApexTrainer.execution_plan(workers, config, **kwargs)`