2019-02-13 19:08:26 +01:00
|
|
|
import logging
|
2020-06-19 13:09:05 -07:00
|
|
|
import gym
|
2019-02-18 01:28:19 -08:00
|
|
|
import numpy as np
|
2021-02-08 12:05:16 +01:00
|
|
|
from typing import Callable, List, Optional, Tuple
|
2019-02-13 19:08:26 +01:00
|
|
|
|
2021-10-29 10:46:52 +02:00
|
|
|
from ray.rllib.utils.annotations import Deprecated, override, PublicAPI
|
|
|
|
from ray.rllib.utils.typing import EnvActionType, EnvInfoDict, \
|
|
|
|
EnvObsType, EnvType
|
2018-12-08 16:28:58 -08:00
|
|
|
|
2019-02-13 19:08:26 +01:00
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2019-01-23 21:27:26 -08:00
|
|
|
@PublicAPI
|
2020-01-02 17:42:13 -08:00
|
|
|
class VectorEnv:
|
2020-05-30 22:48:34 +02:00
|
|
|
"""An environment that supports batch evaluation using clones of sub-envs.
|
|
|
|
"""
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2020-06-19 13:09:05 -07:00
|
|
|
def __init__(self, observation_space: gym.Space, action_space: gym.Space,
|
|
|
|
num_envs: int):
|
2021-10-29 10:46:52 +02:00
|
|
|
"""Initializes a VectorEnv instance.
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2020-05-30 22:48:34 +02:00
|
|
|
Args:
|
2021-10-29 10:46:52 +02:00
|
|
|
observation_space: The observation Space of a single
|
2020-05-30 22:48:34 +02:00
|
|
|
sub-env.
|
2021-10-29 10:46:52 +02:00
|
|
|
action_space: The action Space of a single sub-env.
|
|
|
|
num_envs: The number of clones to make of the given sub-env.
|
2020-05-30 22:48:34 +02:00
|
|
|
"""
|
|
|
|
self.observation_space = observation_space
|
|
|
|
self.action_space = action_space
|
|
|
|
self.num_envs = num_envs
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
|
|
|
@staticmethod
|
2021-10-29 10:46:52 +02:00
|
|
|
def vectorize_gym_envs(
|
|
|
|
make_env: Optional[Callable[[int], EnvType]] = None,
|
|
|
|
existing_envs: Optional[List[gym.Env]] = None,
|
|
|
|
num_envs: int = 1,
|
|
|
|
action_space: Optional[gym.Space] = None,
|
|
|
|
observation_space: Optional[gym.Space] = None,
|
|
|
|
# Deprecated. These seem to have never been used.
|
|
|
|
env_config=None,
|
|
|
|
policy_config=None) -> "_VectorizedGymEnv":
|
|
|
|
"""Translates any given gym.Env(s) into a VectorizedEnv object.
|
|
|
|
|
|
|
|
Args:
|
|
|
|
make_env: Factory that produces a new gym.Env taking the sub-env's
|
|
|
|
vector index as only arg. Must be defined if the
|
|
|
|
number of `existing_envs` is less than `num_envs`.
|
|
|
|
existing_envs: Optional list of already instantiated sub
|
|
|
|
environments.
|
|
|
|
num_envs: Total number of sub environments in this VectorEnv.
|
|
|
|
action_space: The action space. If None, use existing_envs[0]'s
|
|
|
|
action space.
|
|
|
|
observation_space: The observation space. If None, use
|
|
|
|
existing_envs[0]'s action space.
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
The resulting _VectorizedGymEnv object (subclass of VectorEnv).
|
|
|
|
"""
|
2020-05-30 22:48:34 +02:00
|
|
|
return _VectorizedGymEnv(
|
|
|
|
make_env=make_env,
|
|
|
|
existing_envs=existing_envs or [],
|
|
|
|
num_envs=num_envs,
|
|
|
|
observation_space=observation_space,
|
|
|
|
action_space=action_space,
|
2021-02-08 12:05:16 +01:00
|
|
|
)
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2019-01-23 21:27:26 -08:00
|
|
|
@PublicAPI
|
2020-06-19 13:09:05 -07:00
|
|
|
def vector_reset(self) -> List[EnvObsType]:
|
2020-05-30 22:48:34 +02:00
|
|
|
"""Resets all sub-environments.
|
2018-06-23 18:32:16 -07:00
|
|
|
|
|
|
|
Returns:
|
2021-10-29 10:46:52 +02:00
|
|
|
List of observations from each environment.
|
2018-06-23 18:32:16 -07:00
|
|
|
"""
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
raise NotImplementedError
|
|
|
|
|
2019-01-23 21:27:26 -08:00
|
|
|
@PublicAPI
|
2021-02-08 12:05:16 +01:00
|
|
|
def reset_at(self, index: Optional[int] = None) -> EnvObsType:
|
2018-06-23 18:32:16 -07:00
|
|
|
"""Resets a single environment.
|
|
|
|
|
2021-02-08 12:05:16 +01:00
|
|
|
Args:
|
2021-10-29 10:46:52 +02:00
|
|
|
index: An optional sub-env index to reset.
|
2021-02-08 12:05:16 +01:00
|
|
|
|
2018-06-23 18:32:16 -07:00
|
|
|
Returns:
|
2021-10-29 10:46:52 +02:00
|
|
|
Observations from the reset sub environment.
|
2018-06-23 18:32:16 -07:00
|
|
|
"""
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
raise NotImplementedError
|
|
|
|
|
2019-01-23 21:27:26 -08:00
|
|
|
@PublicAPI
|
2020-06-19 13:09:05 -07:00
|
|
|
def vector_step(
|
|
|
|
self, actions: List[EnvActionType]
|
|
|
|
) -> Tuple[List[EnvObsType], List[float], List[bool], List[EnvInfoDict]]:
|
2020-05-30 22:48:34 +02:00
|
|
|
"""Performs a vectorized step on all sub environments using `actions`.
|
2018-06-23 18:32:16 -07:00
|
|
|
|
2020-09-20 11:27:02 +02:00
|
|
|
Args:
|
2021-10-29 10:46:52 +02:00
|
|
|
actions: List of actions (one for each sub-env).
|
2018-06-23 18:32:16 -07:00
|
|
|
|
|
|
|
Returns:
|
2021-10-29 10:46:52 +02:00
|
|
|
A tuple consisting of
|
|
|
|
1) New observations for each sub-env.
|
|
|
|
2) Reward values for each sub-env.
|
|
|
|
3) Done values for each sub-env.
|
|
|
|
4) Info values for each sub-env.
|
2018-06-23 18:32:16 -07:00
|
|
|
"""
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
raise NotImplementedError
|
|
|
|
|
2019-01-23 21:27:26 -08:00
|
|
|
@PublicAPI
|
2021-10-29 10:46:52 +02:00
|
|
|
def get_sub_environments(self) -> List[EnvType]:
|
2020-05-30 22:48:34 +02:00
|
|
|
"""Returns the underlying sub environments.
|
|
|
|
|
|
|
|
Returns:
|
2021-10-29 10:46:52 +02:00
|
|
|
List of all underlying sub environments.
|
2020-05-30 22:48:34 +02:00
|
|
|
"""
|
2021-07-28 10:40:04 -04:00
|
|
|
return []
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2021-06-23 09:09:01 +02:00
|
|
|
# TODO: (sven) Experimental method. Make @PublicAPI at some point.
|
|
|
|
def try_render_at(self, index: Optional[int] = None) -> \
|
|
|
|
Optional[np.ndarray]:
|
2021-02-08 12:05:16 +01:00
|
|
|
"""Renders a single environment.
|
|
|
|
|
|
|
|
Args:
|
2021-10-29 10:46:52 +02:00
|
|
|
index: An optional sub-env index to render.
|
2021-06-23 09:09:01 +02:00
|
|
|
|
|
|
|
Returns:
|
2021-10-29 10:46:52 +02:00
|
|
|
Either a numpy RGB image (shape=(w x h x 3) dtype=uint8) or
|
|
|
|
None in case rendering is handled directly by this method.
|
2021-02-08 12:05:16 +01:00
|
|
|
"""
|
|
|
|
pass
|
|
|
|
|
2021-10-29 10:46:52 +02:00
|
|
|
@Deprecated(new="vectorize_gym_envs", error=False)
|
|
|
|
def wrap(self, *args, **kwargs) -> "_VectorizedGymEnv":
|
|
|
|
return self.vectorize_gym_envs(*args, **kwargs)
|
|
|
|
|
|
|
|
@Deprecated(new="get_sub_environments", error=False)
|
|
|
|
def get_unwrapped(self) -> List[EnvType]:
|
|
|
|
return self.get_sub_environments()
|
|
|
|
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
|
|
|
class _VectorizedGymEnv(VectorEnv):
|
2021-09-02 09:28:16 +02:00
|
|
|
"""Internal wrapper to translate any gym.Envs into a VectorEnv object.
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
"""
|
|
|
|
|
2021-02-08 12:05:16 +01:00
|
|
|
def __init__(
|
|
|
|
self,
|
2021-10-29 10:46:52 +02:00
|
|
|
make_env: Optional[Callable[[int], EnvType]] = None,
|
|
|
|
existing_envs: Optional[List[gym.Env]] = None,
|
|
|
|
num_envs: int = 1,
|
2021-02-08 12:05:16 +01:00
|
|
|
*,
|
2021-10-29 10:46:52 +02:00
|
|
|
observation_space: Optional[gym.Space] = None,
|
|
|
|
action_space: Optional[gym.Space] = None,
|
|
|
|
# Deprecated. These seem to have never been used.
|
2021-02-08 12:05:16 +01:00
|
|
|
env_config=None,
|
|
|
|
policy_config=None,
|
|
|
|
):
|
2020-05-30 22:48:34 +02:00
|
|
|
"""Initializes a _VectorizedGymEnv object.
|
|
|
|
|
|
|
|
Args:
|
2021-10-29 10:46:52 +02:00
|
|
|
make_env: Factory that produces a new gym.Env taking the sub-env's
|
|
|
|
vector index as only arg. Must be defined if the
|
2020-05-30 22:48:34 +02:00
|
|
|
number of `existing_envs` is less than `num_envs`.
|
2021-10-29 10:46:52 +02:00
|
|
|
existing_envs: Optional list of already instantiated sub
|
|
|
|
environments.
|
|
|
|
num_envs: Total number of sub environments in this VectorEnv.
|
|
|
|
action_space: The action space. If None, use existing_envs[0]'s
|
|
|
|
action space.
|
|
|
|
observation_space: The observation space. If None, use
|
2020-05-30 22:48:34 +02:00
|
|
|
existing_envs[0]'s action space.
|
|
|
|
"""
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
self.envs = existing_envs
|
2021-02-08 12:05:16 +01:00
|
|
|
|
|
|
|
# Fill up missing envs (so we have exactly num_envs sub-envs in this
|
|
|
|
# VectorEnv.
|
2020-05-30 22:48:34 +02:00
|
|
|
while len(self.envs) < num_envs:
|
2021-02-08 12:05:16 +01:00
|
|
|
self.envs.append(make_env(len(self.envs)))
|
|
|
|
|
2020-05-30 22:48:34 +02:00
|
|
|
super().__init__(
|
|
|
|
observation_space=observation_space
|
|
|
|
or self.envs[0].observation_space,
|
|
|
|
action_space=action_space or self.envs[0].action_space,
|
|
|
|
num_envs=num_envs)
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2018-12-08 16:28:58 -08:00
|
|
|
@override(VectorEnv)
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
def vector_reset(self):
|
|
|
|
return [e.reset() for e in self.envs]
|
|
|
|
|
2018-12-08 16:28:58 -08:00
|
|
|
@override(VectorEnv)
|
2021-02-08 12:05:16 +01:00
|
|
|
def reset_at(self, index: Optional[int] = None) -> EnvObsType:
|
|
|
|
if index is None:
|
|
|
|
index = 0
|
2018-08-01 16:29:27 -07:00
|
|
|
return self.envs[index].reset()
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
|
2018-12-08 16:28:58 -08:00
|
|
|
@override(VectorEnv)
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
def vector_step(self, actions):
|
|
|
|
obs_batch, rew_batch, done_batch, info_batch = [], [], [], []
|
|
|
|
for i in range(self.num_envs):
|
2019-02-18 01:28:19 -08:00
|
|
|
obs, r, done, info = self.envs[i].step(actions[i])
|
|
|
|
if not np.isscalar(r) or not np.isreal(r) or not np.isfinite(r):
|
|
|
|
raise ValueError(
|
2020-04-15 13:25:16 +02:00
|
|
|
"Reward should be finite scalar, got {} ({}). "
|
|
|
|
"Actions={}.".format(r, type(r), actions[i]))
|
2020-10-08 00:00:37 +02:00
|
|
|
if not isinstance(info, dict):
|
2019-02-18 01:28:19 -08:00
|
|
|
raise ValueError("Info should be a dict, got {} ({})".format(
|
|
|
|
info, type(info)))
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
obs_batch.append(obs)
|
2019-02-18 01:28:19 -08:00
|
|
|
rew_batch.append(r)
|
[rllib] Envs for vectorized execution, async execution, and policy serving (#2170)
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
2018-06-18 11:55:32 -07:00
|
|
|
done_batch.append(done)
|
|
|
|
info_batch.append(info)
|
|
|
|
return obs_batch, rew_batch, done_batch, info_batch
|
|
|
|
|
2018-12-08 16:28:58 -08:00
|
|
|
@override(VectorEnv)
|
2021-10-29 10:46:52 +02:00
|
|
|
def get_sub_environments(self):
|
2018-08-23 17:49:10 -07:00
|
|
|
return self.envs
|
2021-02-08 12:05:16 +01:00
|
|
|
|
|
|
|
@override(VectorEnv)
|
|
|
|
def try_render_at(self, index: Optional[int] = None):
|
|
|
|
if index is None:
|
|
|
|
index = 0
|
|
|
|
return self.envs[index].render()
|