ray/rllib/models/action_dist.py

import numpy as np
import gym

from ray.rllib.models.modelv2 import ModelV2
from ray.rllib.utils.annotations import DeveloperAPI
from ray.rllib.utils.typing import TensorType, List, Union, ModelConfigDict


@DeveloperAPI
class ActionDistribution:
    """The policy action distribution of an agent.

    Attributes:
        inputs: input vector to compute samples from.
        model (ModelV2): reference to model producing the inputs.
    """

    @DeveloperAPI
    def __init__(self, inputs: List[TensorType], model: ModelV2):
        """Initializes an ActionDist object.

        Args:
            inputs: input vector to compute samples from.
            model (ModelV2): reference to model producing the inputs. This
                is mainly useful if you want to use model variables to compute
                action outputs (i.e., for auto-regressive action distributions,
                see examples/autoregressive_action_dist.py).
        """
        self.inputs = inputs
        self.model = model

    @DeveloperAPI
    def sample(self) -> TensorType:
        """Draw a sample from the action distribution."""
        raise NotImplementedError

    @DeveloperAPI
    def deterministic_sample(self) -> TensorType:
        """
        Get the deterministic "sampling" output from the distribution.
        This is usually the max likelihood output, i.e. mean for Normal, argmax
        for Categorical, etc..
        """
        raise NotImplementedError

    @DeveloperAPI
    def sampled_action_logp(self) -> TensorType:
        """Returns the log probability of the last sampled action."""
        raise NotImplementedError

    @DeveloperAPI
    def logp(self, x: TensorType) -> TensorType:
        """The log-likelihood of the action distribution."""
        raise NotImplementedError

    @DeveloperAPI
    def kl(self, other: "ActionDistribution") -> TensorType:
        """The KL-divergence between two action distributions."""
        raise NotImplementedError

    @DeveloperAPI
    def entropy(self) -> TensorType:
        """The entropy of the action distribution."""
        raise NotImplementedError

    def multi_kl(self, other: "ActionDistribution") -> TensorType:
        """The KL-divergence between two action distributions.

        This differs from kl() in that it can return an array for
        MultiDiscrete. TODO(ekl) consider removing this.
        """
        return self.kl(other)

    def multi_entropy(self) -> TensorType:
        """The entropy of the action distribution.

        This differs from entropy() in that it can return an array for
        MultiDiscrete. TODO(ekl) consider removing this.
        """
        return self.entropy()

    @staticmethod
    @DeveloperAPI
    def required_model_output_shape(
        action_space: gym.Space, model_config: ModelConfigDict
    ) -> Union[int, np.ndarray]:
        """Returns the required shape of an input parameter tensor for a
        particular action space and an optional dict of distribution-specific
        options.

        Args:
            action_space (gym.Space): The action space this distribution will
                be used for, whose shape attributes will be used to determine
                the required shape of the input parameter tensor.
            model_config: Model's config dict (as defined in catalog.py)

        Returns:
            model_output_shape (int or np.ndarray of ints): size of the
                required input vector (minus leading batch dimension).
        """
        raise NotImplementedError
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`import numpy as np`
			`import gym`

			`from ray.rllib.models.modelv2 import ModelV2`
[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00			`from ray.rllib.utils.annotations import DeveloperAPI`
[RLlib] Rename rllib.utils.types into typing to match built-in python module's name. (#10114) 2020-08-15 13:24:22 +02:00			`from ray.rllib.utils.typing import TensorType, List, Union, ModelConfigDict`
Support older version TF and Support RMSProp in Impala (#2590) to support TF version < 1.5 to support rmsprop optimizer in Impala Before TF1.5, tf.reduce_sum() and tf.reduce_max() has an argument keep_dims which has been renamed as keepdims in later versions. In the original paper of Impala, they use rmsprop algorithm to optimize the model. We'd better also support it so that users can reproduce their experiments. Without any tuning, say that using the same hyper-parameters as AdamOptimizer, it reaches "episode_reward_mean": 19.083333333333332 in Pong after consume 3,610,350 samples. 2018-08-09 19:51:32 -07:00
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00
[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
Remove (object) from class declarations. (#6658) 2020-01-02 17:42:13 -08:00			`class ActionDistribution:`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`"""The policy action distribution of an agent.`

[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`Attributes:`
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00			`inputs: input vector to compute samples from.`
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`model (ModelV2): reference to model producing the inputs.`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`"""`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def __init__(self, inputs: List[TensorType], model: ModelV2):`
[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678) 2020-07-29 21:15:09 +02:00			`"""Initializes an ActionDist object.`
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00
[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678) 2020-07-29 21:15:09 +02:00			`Args:`
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00			`inputs: input vector to compute samples from.`
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`model (ModelV2): reference to model producing the inputs. This`
			`is mainly useful if you want to use model variables to compute`
			`action outputs (i.e., for auto-regressive action distributions,`
			`see examples/autoregressive_action_dist.py).`
			`"""`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`self.inputs = inputs`
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`self.model = model`
[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00
			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def sample(self) -> TensorType:`
[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00			`"""Draw a sample from the action distribution."""`
			`raise NotImplementedError`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155) 2020-02-19 21:18:45 +01:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def deterministic_sample(self) -> TensorType:`
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155) 2020-02-19 21:18:45 +01:00			`"""`
			`Get the deterministic "sampling" output from the distribution.`
			`This is usually the max likelihood output, i.e. mean for Normal, argmax`
			`for Categorical, etc..`
			`"""`
			`raise NotImplementedError`

[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def sampled_action_logp(self) -> TensorType:`
[rllib] Autoregressive action distributions (#5304) 2019-08-10 14:05:12 -07:00			`"""Returns the log probability of the last sampled action."""`
			`raise NotImplementedError`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def logp(self, x: TensorType) -> TensorType:`
[rllib] Initial RLLib documentation (#969) * initial documentation for RLLib * more RL documentation * fix linting * fix comments * update * fix 2017-09-12 23:38:21 -07:00			`"""The log-likelihood of the action distribution."""`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`raise NotImplementedError`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def kl(self, other: "ActionDistribution") -> TensorType:`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00			`"""The KL-divergence between two action distributions."""`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`raise NotImplementedError`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def entropy(self) -> TensorType:`
[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941) 2019-02-13 16:25:05 -08:00			`"""The entropy of the action distribution."""`
			`raise NotImplementedError`

[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def multi_kl(self, other: "ActionDistribution") -> TensorType:`
[rllib] MultiCategorical shouldn't return array for kl or entropy (#5215) * wip * fix 2019-07-19 12:12:04 -07:00			`"""The KL-divergence between two action distributions.`

			`This differs from kl() in that it can return an array for`
			`MultiDiscrete. TODO(ekl) consider removing this.`
			`"""`
			`return self.kl(other)`

[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def multi_entropy(self) -> TensorType:`
[rllib] MultiCategorical shouldn't return array for kl or entropy (#5215) * wip * fix 2019-07-19 12:12:04 -07:00			`"""The entropy of the action distribution.`

			`This differs from entropy() in that it can return an array for`
			`MultiDiscrete. TODO(ekl) consider removing this.`
			`"""`
			`return self.entropy()`
Custom action distributions (#5164) * custom action dist wip * Test case for custom action dist * ActionDistribution.get_parameter_shape_for_action_space pattern * Edit exception message to also suggest using a custom action distribution * Clean up ModelCatalog.get_action_dist * Pass model config to ActionDistribution constructors * Update custom action distribution test case * Name fix * Autoformatter * parameter shape static methods for torch distributions * Fix docstring * Generalize fake array for graph initialization * Fix action dist constructors * Correct parameter shape static methods for multicategorical and gaussian * Make suggested changes to custom action dist's * Correct instances of not passing model config to action dist * Autoformatter * fix tuple distribution constructor * bugfix 2019-08-06 18:13:16 +00:00
			`@staticmethod`
Annotate datasources and add API annotation check script (#24999) Why are these changes needed? Add API stability annotations for datasource classes, and add a linter to check all data classes have appropriate annotations. 2022-05-21 15:05:07 -07:00			`@DeveloperAPI`
[rllib] Type annotations for model classes (#9646) 2020-07-24 12:01:46 -07:00			`def required_model_output_shape(`
			`action_space: gym.Space, model_config: ModelConfigDict`
			`) -> Union[int, np.ndarray]:`
Custom action distributions (#5164) * custom action dist wip * Test case for custom action dist * ActionDistribution.get_parameter_shape_for_action_space pattern * Edit exception message to also suggest using a custom action distribution * Clean up ModelCatalog.get_action_dist * Pass model config to ActionDistribution constructors * Update custom action distribution test case * Name fix * Autoformatter * parameter shape static methods for torch distributions * Fix docstring * Generalize fake array for graph initialization * Fix action dist constructors * Correct parameter shape static methods for multicategorical and gaussian * Make suggested changes to custom action dist's * Correct instances of not passing model config to action dist * Autoformatter * fix tuple distribution constructor * bugfix 2019-08-06 18:13:16 +00:00			`"""Returns the required shape of an input parameter tensor for a`
			`particular action space and an optional dict of distribution-specific`
			`options.`

			`Args:`
			`action_space (gym.Space): The action space this distribution will`
			`be used for, whose shape attributes will be used to determine`
			`the required shape of the input parameter tensor.`
Clean up docstyle in python modules and add LINT rule (#25272) 2022-06-01 11:27:54 -07:00			`model_config: Model's config dict (as defined in catalog.py)`
Custom action distributions (#5164) * custom action dist wip * Test case for custom action dist * ActionDistribution.get_parameter_shape_for_action_space pattern * Edit exception message to also suggest using a custom action distribution * Clean up ModelCatalog.get_action_dist * Pass model config to ActionDistribution constructors * Update custom action distribution test case * Name fix * Autoformatter * parameter shape static methods for torch distributions * Fix docstring * Generalize fake array for graph initialization * Fix action dist constructors * Correct parameter shape static methods for multicategorical and gaussian * Make suggested changes to custom action dist's * Correct instances of not passing model config to action dist * Autoformatter * fix tuple distribution constructor * bugfix 2019-08-06 18:13:16 +00:00
			`Returns:`
			`model_output_shape (int or np.ndarray of ints): size of the`
			`required input vector (minus leading batch dimension).`
			`"""`
			`raise NotImplementedError`