ray/rllib/policy/torch_policy_template.py

import gym
from typing import Callable, Dict, List, Optional, Tuple, Type, Union

from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.models.torch.torch_action_dist import TorchDistributionWrapper
from ray.rllib.policy.policy import Policy
from ray.rllib.policy.policy_template import build_policy_class
from ray.rllib.policy.sample_batch import SampleBatch
from ray.rllib.policy.torch_policy import TorchPolicy
from ray.rllib.utils.deprecation import Deprecated
from ray.rllib.utils.framework import try_import_torch
from ray.rllib.utils.typing import ModelGradients, TensorType, AlgorithmConfigDict

torch, _ = try_import_torch()


@Deprecated(new="build_policy_class(framework='torch')", error=False)
def build_torch_policy(
    name: str,
    *,
    loss_fn: Optional[
        Callable[
            [Policy, ModelV2, Type[TorchDistributionWrapper], SampleBatch],
            Union[TensorType, List[TensorType]],
        ]
    ],
    get_default_config: Optional[Callable[[], AlgorithmConfigDict]] = None,
    stats_fn: Optional[Callable[[Policy, SampleBatch], Dict[str, TensorType]]] = None,
    postprocess_fn=None,
    extra_action_out_fn: Optional[
        Callable[
            [
                Policy,
                Dict[str, TensorType],
                List[TensorType],
                ModelV2,
                TorchDistributionWrapper,
            ],
            Dict[str, TensorType],
        ]
    ] = None,
    extra_grad_process_fn: Optional[
        Callable[[Policy, "torch.optim.Optimizer", TensorType], Dict[str, TensorType]]
    ] = None,
    extra_learn_fetches_fn: Optional[Callable[[Policy], Dict[str, TensorType]]] = None,
    optimizer_fn: Optional[
        Callable[[Policy, AlgorithmConfigDict], "torch.optim.Optimizer"]
    ] = None,
    validate_spaces: Optional[
        Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]
    ] = None,
    before_init: Optional[
        Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]
    ] = None,
    before_loss_init: Optional[
        Callable[
            [Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], None
        ]
    ] = None,
    after_init: Optional[
        Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]
    ] = None,
    _after_loss_init: Optional[
        Callable[
            [Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], None
        ]
    ] = None,
    action_sampler_fn: Optional[
        Callable[[TensorType, List[TensorType]], Tuple[TensorType, TensorType]]
    ] = None,
    action_distribution_fn: Optional[
        Callable[
            [Policy, ModelV2, TensorType, TensorType, TensorType],
            Tuple[TensorType, type, List[TensorType]],
        ]
    ] = None,
    make_model: Optional[
        Callable[
            [Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], ModelV2
        ]
    ] = None,
    make_model_and_action_dist: Optional[
        Callable[
            [Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict],
            Tuple[ModelV2, Type[TorchDistributionWrapper]],
        ]
    ] = None,
    compute_gradients_fn: Optional[
        Callable[[Policy, SampleBatch], Tuple[ModelGradients, dict]]
    ] = None,
    apply_gradients_fn: Optional[
        Callable[[Policy, "torch.optim.Optimizer"], None]
    ] = None,
    mixins: Optional[List[type]] = None,
    get_batch_divisibility_req: Optional[Callable[[Policy], int]] = None
) -> Type[TorchPolicy]:

    kwargs = locals().copy()
    # Set to torch and call new function.
    kwargs["framework"] = "torch"
    return build_policy_class(**kwargs)
[RLlib] Missing type annotations policy templates. (#9846) 2020-08-06 05:33:24 +02:00			`import gym`
[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com> 2021-05-04 10:06:19 -07:00			`from typing import Callable, Dict, List, Optional, Tuple, Type, Union`
[RLlib] Missing type annotations policy templates. (#9846) 2020-08-06 05:33:24 +02:00
			`from ray.rllib.models.modelv2 import ModelV2`
			`from ray.rllib.models.torch.torch_action_dist import TorchDistributionWrapper`
[RLlib] JAXPolicy prep. PR #1. (#13077) 2020-12-26 20:14:18 -05:00			`from ray.rllib.policy.policy import Policy`
			`from ray.rllib.policy.policy_template import build_policy_class`
[RLlib] Missing type annotations policy templates. (#9846) 2020-08-06 05:33:24 +02:00			`from ray.rllib.policy.sample_batch import SampleBatch`
			`from ray.rllib.policy.torch_policy import TorchPolicy`
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829) 2021-11-01 21:46:02 +01:00			`from ray.rllib.utils.deprecation import Deprecated`
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238) * Take out stats to analyze memory leak in torch PPO. * WIP * WIP * WIP * WIP * WIP * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * LINT. * Fix determine_tests_to_run.py. * minor change to re-test after determine_tests_to_run.py. * LINT. * update comments. * WIP * WIP * WIP * FIX. * Fix sequence_mask being dependent on torch being installed. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. 2020-02-22 20:02:31 +01:00			`from ray.rllib.utils.framework import try_import_torch`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`from ray.rllib.utils.typing import ModelGradients, TensorType, AlgorithmConfigDict`
[CI] Upgrade flake8 to 3.9.1 (#15527) * formatting * format util * format release * format rllib/agents * format rllib/env * format rllib/execution * format rllib/evaluation * format rllib/examples * format rllib/policy * format rllib utils and tests * format streaming * more formatting * update requirements files * fix rllib type checking * updates * update * fix circular import * Update python/ray/tests/test_runtime_env.py * noqa 2021-05-03 14:23:28 -07:00
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238) * Take out stats to analyze memory leak in torch PPO. * WIP * WIP * WIP * WIP * WIP * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * WIP. * LINT. * Fix determine_tests_to_run.py. * minor change to re-test after determine_tests_to_run.py. * LINT. * update comments. * WIP * WIP * WIP * FIX. * Fix sequence_mask being dependent on torch being installed. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. * Fix strange ray-core tf-error in test_memory_scheduling test case. 2020-02-22 20:02:31 +01:00			`torch, _ = try_import_torch()`
[rllib] [RFC] Dynamic definition of loss functions and modularization support (#4795) * dynamic graph * wip * clean up * fix * document trainer * wip * initialize the graph using a fake batch * clean up dynamic init * wip * spelling * use builder for ppo pol graph * add ppo graph * fix naming * order * docs * set class name correctly * add torch builder * add custom model support in builder * cleanup * remove underscores * fix py2 compat * Update dynamic_tf_policy_graph.py * Update tracking_dict.py * wip * rename * debug level * rename policy_graph -> policy in new classes * fix test * rename ppo tf policy * port appo too * forgot grads * default policy optimizer * make default config optional * add config to optimizer * use lr by default in optimizer * update * comments * remove optimizer * fix tuple actions support in dynamic tf graph 2019-05-18 00:23:11 -07:00

[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. (#17530) 2021-08-03 18:30:02 -04:00			`@Deprecated(new="build_policy_class(framework='torch')", error=False)`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`def build_torch_policy(`
			`name: str,`
			`*,`
[RLlib] ARS/ES eval workers not working: Issue 9933. (#11308) 2020-10-12 22:49:48 +02:00			`loss_fn: Optional[`
			`Callable[`
[RLlib] First attempt at cleaning up algo code in RLlib: PG. (#10115) 2020-08-20 17:05:57 +02:00			`[Policy, ModelV2, Type[TorchDistributionWrapper], SampleBatch],`
[RLlib] ARS/ES eval workers not working: Issue 9933. (#11308) 2020-10-12 22:49:48 +02:00			`Union[TensorType, List[TensorType]],`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`]`
[RLlib] ARS/ES eval workers not working: Issue 9933. (#11308) 2020-10-12 22:49:48 +02:00			`],`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`get_default_config: Optional[Callable[[], AlgorithmConfigDict]] = None,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`stats_fn: Optional[Callable[[Policy, SampleBatch], Dict[str, TensorType]]] = None,`
[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com> 2021-05-04 10:06:19 -07:00			`postprocess_fn=None,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`extra_action_out_fn: Optional[`
			`Callable[`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`[`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`Policy,`
			`Dict[str, TensorType],`
			`List[TensorType],`
			`ModelV2,`
			`TorchDistributionWrapper,`
			`],`
			`Dict[str, TensorType],`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
			`extra_grad_process_fn: Optional[`
			`Callable[[Policy, "torch.optim.Optimizer", TensorType], Dict[str, TensorType]]`
			`] = None,`
			`extra_learn_fetches_fn: Optional[Callable[[Policy], Dict[str, TensorType]]] = None,`
			`optimizer_fn: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[[Policy, AlgorithmConfigDict], "torch.optim.Optimizer"]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
			`validate_spaces: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
			`before_init: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) 2020-11-03 21:53:34 +01:00			`before_loss_init: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[`
			`[Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], None`
			`]`
[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) 2020-11-03 21:53:34 +01:00			`] = None,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`after_init: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[[Policy, gym.Space, gym.Space, AlgorithmConfigDict], None]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) 2020-11-03 21:53:34 +01:00			`_after_loss_init: Optional[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`Callable[`
			`[Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], None`
			`]`
[RLlib] Trajectory view API (prep PR for switching on by default across all RLlib; plumbing only) (#11717) 2020-11-03 21:53:34 +01:00			`] = None,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`action_sampler_fn: Optional[`
			`Callable[[TensorType, List[TensorType]], Tuple[TensorType, TensorType]]`
			`] = None,`
			`action_distribution_fn: Optional[`
			`Callable[`
			`[Policy, ModelV2, TensorType, TensorType, TensorType],`
			`Tuple[TensorType, type, List[TensorType]],`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
			`make_model: Optional[`
			`Callable[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`[Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict], ModelV2`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`]`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`] = None,`
			`make_model_and_action_dist: Optional[`
			`Callable[`
[RLlib] `Trainer` to `Algorithm` renaming. (#25539) 2022-06-11 15:10:39 +02:00			`[Policy, gym.spaces.Space, gym.spaces.Space, AlgorithmConfigDict],`
[RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00			`Tuple[ModelV2, Type[TorchDistributionWrapper]],`
[CI] Format Python code with Black (#21975) See #21316 and #21311 for the motivation behind these changes. 2022-01-29 18:41:57 -08:00			`]`
[RLlib] SAC algo cleanup. (#10825) 2020-09-20 11:27:02 +02:00			`] = None,`
[RLlib] CQL loss fn fixes, MuJoCo + Pendulum benchmarks, offline-RL example script w/ json file. (#15603) Co-authored-by: Sven Mika <sven@anyscale.io> Co-authored-by: sven1977 <svenmika1977@gmail.com> 2021-05-04 10:06:19 -07:00			`compute_gradients_fn: Optional[`
			`Callable[[Policy, SampleBatch], Tuple[ModelGradients, dict]]`
			`] = None,`
ci: Redo `format.sh --all` script & backfill lint fixes (#9956) 2020-08-07 16:49:49 -07:00			`apply_gradients_fn: Optional[`
			`Callable[[Policy, "torch.optim.Optimizer"], None]`
			`] = None,`
			`mixins: Optional[List[type]] = None,`
[RLlib] Fix all example scripts to run on GPUs. (#11105) 2020-10-02 23:07:44 +02:00			`get_batch_divisibility_req: Optional[Callable[[Policy], int]] = None`
			`) -> Type[TorchPolicy]:`
[rllib] [RFC] Dynamic definition of loss functions and modularization support (#4795) * dynamic graph * wip * clean up * fix * document trainer * wip * initialize the graph using a fake batch * clean up dynamic init * wip * spelling * use builder for ppo pol graph * add ppo graph * fix naming * order * docs * set class name correctly * add torch builder * add custom model support in builder * cleanup * remove underscores * fix py2 compat * Update dynamic_tf_policy_graph.py * Update tracking_dict.py * wip * rename * debug level * rename policy_graph -> policy in new classes * fix test * rename ppo tf policy * port appo too * forgot grads * default policy optimizer * make default config optional * add config to optimizer * use lr by default in optimizer * update * comments * remove optimizer * fix tuple actions support in dynamic tf graph 2019-05-18 00:23:11 -07:00
[RLlib] JAXPolicy prep. PR #1. (#13077) 2020-12-26 20:14:18 -05:00			`kwargs = locals().copy()`
			`# Set to torch and call new function.`
			`kwargs["framework"] = "torch"`
			`return build_policy_class(**kwargs)`