ray/rllib/examples/env/repeat_after_me_env.py

import gym
from gym.spaces import Box, Discrete
import numpy as np


class RepeatAfterMeEnv(gym.Env):
    """Env in which the observation at timestep minus n must be repeated."""

    def __init__(self, config=None):
        config = config or {}
        if config.get("continuous"):
            self.observation_space = Box(-1.0, 1.0, (2,))
        else:
            self.observation_space = Discrete(2)

        self.action_space = self.observation_space
        # Note: Set `repeat_delay` to 0 for simply repeating the seen
        # observation (no delay).
        self.delay = config.get("repeat_delay", 1)
        self.episode_len = config.get("episode_len", 100)
        self.history = []

    def reset(self):
        self.history = [0] * self.delay
        return self._next_obs()

    def step(self, action):
        obs = self.history[-(1 + self.delay)]

        # Box: -abs(diff).
        if isinstance(self.action_space, Box):
            reward = -np.sum(np.abs(action - obs))
        # Discrete: +1.0 if exact match, -1.0 otherwise.
        if isinstance(self.action_space, Discrete):
            reward = 1.0 if action == obs else -1.0

        done = len(self.history) > self.episode_len
        return self._next_obs(), reward, done, {}

    def _next_obs(self):
        if isinstance(self.observation_space, Box):
            token = np.random.random(size=(2,))
        else:
            token = np.random.choice([0, 1])
        self.history.append(token)
        return token
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`import gym`
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00			`from gym.spaces import Box, Discrete`
			`import numpy as np`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00

			`class RepeatAfterMeEnv(gym.Env):`
			`"""Env in which the observation at timestep minus n must be repeated."""`

[RLlib] Improve example scripts for attention nets, CartPole LSTM, and custom RNN-models. (#15329) 2021-04-15 16:11:34 +02:00			`def __init__(self, config=None):`
			`config = config or {}`
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00			`if config.get("continuous"):`
			`self.observation_space = Box(-1.0, 1.0, (2,))`
			`else:`
			`self.observation_space = Discrete(2)`

			`self.action_space = self.observation_space`
[RLlib] Multi-GPU support for Torch algorithms. (#14709) 2021-04-16 09:16:24 +02:00			# Note: Set `repeat_delay` to 0 for simply repeating the seen
			`# observation (no delay).`
[RLlib] Fix `use_lstm` flag for ModelV2 (w/o ModelV1 wrapping) and add it for PyTorch. (#8734) 2020-06-05 15:40:30 +02:00			`self.delay = config.get("repeat_delay", 1)`
[RLlib] Multi-GPU support for Torch algorithms. (#14709) 2021-04-16 09:16:24 +02:00			`self.episode_len = config.get("episode_len", 100)`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`self.history = []`

			`def reset(self):`
			`self.history = [0] * self.delay`
			`return self._next_obs()`

			`def step(self, action):`
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00			`obs = self.history[-(1 + self.delay)]`

			`# Box: -abs(diff).`
			`if isinstance(self.action_space, Box):`
			`reward = -np.sum(np.abs(action - obs))`
			`# Discrete: +1.0 if exact match, -1.0 otherwise.`
			`if isinstance(self.action_space, Discrete):`
			`reward = 1.0 if action == obs else -1.0`

[RLlib] Multi-GPU support for Torch algorithms. (#14709) 2021-04-16 09:16:24 +02:00			`done = len(self.history) > self.episode_len`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`return self._next_obs(), reward, done, {}`

			`def _next_obs(self):`
[RLlib] Add multi-GPU learning tests to nightly. (#17778) 2021-08-18 17:21:01 +02:00			`if isinstance(self.observation_space, Box):`
			`token = np.random.random(size=(2,))`
			`else:`
			`token = np.random.choice([0, 1])`
[RLlib] rllib/examples folder restructuring (#8250) Cleans up of the rllib/examples folder by moving all example Envs into rllibexamples/env (so they can be used by other scripts and tests as well). 2020-05-01 22:59:34 +02:00			`self.history.append(token)`
			`return token`