ray/rllib/utils/exploration/soft_q.py
Sven Mika 83e06cd30a
[RLlib] DDPG refactor and Exploration API action noise classes. (#7314)
* WIP.

* WIP.

* WIP.

* WIP.

* WIP.

* Fix

* WIP.

* Add TD3 quick Pendulum regresison.

* Cleanup.

* Fix.

* LINT.

* Fix.

* Sort quick_learning test cases, add TD3.

* Sort quick_learning test cases, add TD3.

* Revert test_checkpoint_restore.py (debugging) changes.

* Fix old soft_q settings in documentation and test configs.

* More doc fixes.

* Fix test case.

* Fix test case.

* Lower test load.

* WIP.
2020-03-01 11:53:35 -08:00

32 lines
1.1 KiB
Python

from gym.spaces import Discrete
from ray.rllib.utils.exploration.stochastic_sampling import StochasticSampling
class SoftQ(StochasticSampling):
"""Special case of StochasticSampling w/ Categorical and temperature param.
Returns a stochastic sample from a Categorical parameterized by the model
output divided by the temperature. Returns the argmax iff explore=False.
"""
def __init__(self,
action_space,
*,
temperature=1.0,
framework="tf",
**kwargs):
"""Initializes a SoftQ Exploration object.
Args:
action_space (Space): The gym action space used by the environment.
temperature (Schedule): The temperature to divide model outputs by
before creating the Categorical distribution to sample from.
framework (Optional[str]): One of None, "tf", "torch".
"""
assert isinstance(action_space, Discrete)
super().__init__(
action_space,
static_params=dict(temperature=temperature),
framework=framework,
**kwargs)