ray/rllib/models/action_dist.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from ray.rllib.utils.annotations import DeveloperAPI


@DeveloperAPI
class ActionDistribution(object):
    """The policy action distribution of an agent.

    Args:
      inputs (Tensor): The input vector to compute samples from.
    """

    @DeveloperAPI
    def __init__(self, inputs):
        self.inputs = inputs

    @DeveloperAPI
    def sample(self):
        """Draw a sample from the action distribution."""
        raise NotImplementedError

    @DeveloperAPI
    def logp(self, x):
        """The log-likelihood of the action distribution."""
        raise NotImplementedError

    @DeveloperAPI
    def kl(self, other):
        """The KL-divergence between two action distributions."""
        raise NotImplementedError

    @DeveloperAPI
    def entropy(self):
        """The entropy of the action distribution."""
        raise NotImplementedError

    def multi_kl(self, other):
        """The KL-divergence between two action distributions.

        This differs from kl() in that it can return an array for
        MultiDiscrete. TODO(ekl) consider removing this.
        """
        return self.kl(other)

    def multi_entropy(self):
        """The entropy of the action distribution.

        This differs from entropy() in that it can return an array for
        MultiDiscrete. TODO(ekl) consider removing this.
        """
        return self.entropy()
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00			`from __future__ import absolute_import`
			`from __future__ import division`
			`from __future__ import print_function`

[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00			`from ray.rllib.utils.annotations import DeveloperAPI`
Support older version TF and Support RMSProp in Impala (#2590) to support TF version < 1.5 to support rmsprop optimizer in Impala Before TF1.5, tf.reduce_sum() and tf.reduce_max() has an argument keep_dims which has been renamed as keepdims in later versions. In the original paper of Impala, they use rmsprop algorithm to optimize the model. We'd better also support it so that users can reproduce their experiments. Without any tuning, say that using the same hyper-parameters as AdamOptimizer, it reaches "episode_reward_mean": 19.083333333333332 in Pong after consume 3,610,350 samples. 2018-08-09 19:51:32 -07:00
Add policy gradient example. (#344) * add policy gradient example * fix typos * Minor changes plus some documentation. * Minor fixes. 2017-03-07 23:42:44 -08:00
[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`class ActionDistribution(object):`
			`"""The policy action distribution of an agent.`

			`Args:`
			`inputs (Tensor): The input vector to compute samples from.`
			`"""`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`def __init__(self, inputs):`
			`self.inputs = inputs`
[rllib] Document ModelV2 and clean up the models/ directory (#5277) 2019-07-27 02:08:16 -07:00
			`@DeveloperAPI`
			`def sample(self):`
			`"""Draw a sample from the action distribution."""`
			`raise NotImplementedError`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00
[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`def logp(self, x):`
[rllib] Initial RLLib documentation (#969) * initial documentation for RLLib * more RL documentation * fix linting * fix comments * update * fix 2017-09-12 23:38:21 -07:00			`"""The log-likelihood of the action distribution."""`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`raise NotImplementedError`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`def kl(self, other):`
[rllib] Split docs into user and development guide (#1377) * docs * Update README.rst * Sat Dec 30 15:23:49 PST 2017 * comments * Sun Dec 31 23:33:30 PST 2017 * Sun Dec 31 23:33:38 PST 2017 * Sun Dec 31 23:37:46 PST 2017 * Sun Dec 31 23:39:28 PST 2017 * Sun Dec 31 23:43:05 PST 2017 * Sun Dec 31 23:51:55 PST 2017 * Sun Dec 31 23:52:51 PST 2017 2018-01-01 11:10:44 -08:00			`"""The KL-divergence between two action distributions."""`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`raise NotImplementedError`

[rllib] annotate public vs developer vs private APIs (#3808) 2019-01-23 21:27:26 -08:00			`@DeveloperAPI`
[rllib] Pull out shared models for evolution strategies and policy gradient. (#719) * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * wip * works with cartpole * lint * fix pg * comment * action dist rename * preprocessor * fix test * typo * fix the action[0] nonsense * revert * satisfy the lint * Minor indentation changes. * fix merge * add humanoid * fix linting * more 4 space * fix * fix linT * oops * es parity 2017-07-17 01:58:54 -07:00			`def entropy(self):`
[rllib] Basic infrastructure for off-policy estimation (IS, WIS) (#3941) 2019-02-13 16:25:05 -08:00			`"""The entropy of the action distribution."""`
			`raise NotImplementedError`

[rllib] MultiCategorical shouldn't return array for kl or entropy (#5215) * wip * fix 2019-07-19 12:12:04 -07:00			`def multi_kl(self, other):`
			`"""The KL-divergence between two action distributions.`

			`This differs from kl() in that it can return an array for`
			`MultiDiscrete. TODO(ekl) consider removing this.`
			`"""`
			`return self.kl(other)`

			`def multi_entropy(self):`
			`"""The entropy of the action distribution.`

			`This differs from entropy() in that it can return an array for`
			`MultiDiscrete. TODO(ekl) consider removing this.`
			`"""`
			`return self.entropy()`