This refactors the RLlib sampler to support multi-agent environments. The main changes were:
AsyncVectorEnv now produces dicts of env_id -> agent_id -> value rather than env_id -> value. This lets it model both vectorized and multi-agent envs (or both).
The sampler class operates over the above nested dict structure for all envs. Single agent envs just return a dict with one agent_id=single_agent.
When sample() is called on a policy evaluator, in the single agent case we return a SampleBatch, otherwise we return a MultiAgentBatch (which is a list of sample batches per policy).
Left for another PR:
Exposing multi-agent in the public interfaces.
Optimizations such as evaluating multiple policies in one TF run.
* Fix documentation indentation.
* Add error table to GCS and push error messages through node manager.
* Add type to error data.
* Linting
* Fix failure_test bug.
* Linting.
* Enable one more test.
* Attempt to fix doc building.
* Restructuring
* Fixes
* More fixes.
* Move current_time_ms function into util.h.
* build_credis.sh: use an up-to-date credis commit.
* build_credis.sh: leveldb is updated, so update build cmds for it
* WIP: make monitor.py issue flush; switch gcs client to use credis
* Experimental: enable automatic GCS flushing with configurable policy.
* Fix linux compilation error
* Fix leveldb build
* Use optimized build for credis
* Address comments
* Attempt to fix tests
* AWS: support multiple availability zones (fix#2177)
* Bugfix: [] rather than ()
* Test config
* Test config tweaks
* Remove test config
* Formatting fixes
* Update YAML config
## What do these changes do?
**Vectorized envs**: Users can either implement `VectorEnv`, or alternatively set `num_envs=N` to auto-vectorize gym envs (this vectorizes just the action computation part).
```
# CartPole-v0 on single core with 64x64 MLP:
# vector_width=1:
Actions per second 2720.1284458322966
# vector_width=8:
Actions per second 13773.035334888269
# vector_width=64:
Actions per second 37903.20472563333
```
**Async envs**: The more general form of `VectorEnv` is `AsyncVectorEnv`, which allows agents to execute out of lockstep. We use this as an adapter to support `ServingEnv`. Since we can convert any other form of env to `AsyncVectorEnv`, utils.sampler has been rewritten to run against this interface.
**Policy serving**: This provides an env which is not stepped. Rather, the env executes in its own thread, querying the policy for actions via `self.get_action(obs)`, and reporting results via `self.log_returns(rewards)`. We also support logging of off-policy actions via `self.log_action(obs, action)`. This is a more convenient API for some use cases, and also provides parallelizable support for policy serving (for example, if you start a HTTP server in the env) and ingest of offline logs (if the env reads from serving logs).
Any of these types of envs can be passed to RLlib agents. RLlib handles conversions internally in CommonPolicyEvaluator, for example:
```
gym.Env => rllib.VectorEnv => rllib.AsyncVectorEnv
rllib.ServingEnv => rllib.AsyncVectorEnv
```
* Print warning when defining very large remote function or actor.
* Add weak test.
* Check that warnings appear in test.
* Make wait_for_errors actually fail in failure_test.py.
* Use constants for error types.
* Fix
* removed ddpg2
* removed ddpg2 from codebase
* added tests used in ddpg vs ddpg2 comparison
* added notes about training timesteps to yaml files
* removed ddpg2 yaml files
* removed unnecessary configs from yaml files
* removed unnecessary configs from yaml files
* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples
* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples
* added more configuration details to yaml files
* removed random starts from halfcheetah
* add java code lint check and fix the java code lint error
* add java doc lint check and fix the java doc lint error
* add java code and doc lint to the CI
* Enable --scoped-enums in flatbuffer compiler.
* Change enum to c++11 style (enum class).
* Resolve conflicts.
* Solve building failure when RAY_USE_NEW_GCS=on and remove ERROR_INDEX suffix.
* Merge with master and fix CI failure.