IMPALA support for multiagent was broken since IMPALA has a requirement that batch sizes be of a certain length. However multi-agent envs can create variable-length batches.
Fix this by adding zero-padding as needed (similar to the RNN case).
## What do these changes do?
Don't create an excessive amount of workers for rollout.py, and also fix up the env wrapping to be consistent with the internal agent wrapper.
## Related issue number
Closes#3260.
A bunch of minor rllib fixes:
pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.
compute_actions() now takes a new keyword arg episodes
pull out internal episode class into a top-level file
add function to return extra trajectories from an episode that will be appended to the sample batch
documentation
ray exec CLUSTER CMD [--screen] [--start] [--stop]
ray attach CLUSTER [--start]
Example:
ray exec sgd.yaml 'source activate tensorflow_p27 && cd ~/ray/python/ray/rllib && ./train.py --run=PPO --env=CartPole-v0' --screen --start --stop
This will in one command create a cluster and run the command on it in a screen session. The screen can later be attached to via ray attach. After the command finishes, the cluster workers will be terminated and the head node stopped.
Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
Add AsyncSamplesOptimizer that implements the IMPALA architecture
integrate V-trace with a3c policy graph
audit V-trace integration
benchmark compare vs A3C and with V-trace on/off
PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.
* removed ddpg2
* removed ddpg2 from codebase
* added tests used in ddpg vs ddpg2 comparison
* added notes about training timesteps to yaml files
* removed ddpg2 yaml files
* removed unnecessary configs from yaml files
* removed unnecessary configs from yaml files
* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples
* moved pendulum, mountaincarcontinuous, and halfcheetah tests to tuned_examples
* added more configuration details to yaml files
* removed random starts from halfcheetah
* patch up pbt
* Sat Jan 27 01:00:03 PST 2018
* Sat Jan 27 01:04:14 PST 2018
* Sat Jan 27 01:04:21 PST 2018
* Sat Jan 27 01:15:15 PST 2018
* Sat Jan 27 01:15:42 PST 2018
* Sat Jan 27 01:16:14 PST 2018
* Sat Jan 27 01:38:42 PST 2018
* Sat Jan 27 01:39:21 PST 2018
* add pbt
* Sat Jan 27 01:41:19 PST 2018
* Sat Jan 27 01:44:21 PST 2018
* Sat Jan 27 01:45:46 PST 2018
* Sat Jan 27 16:54:42 PST 2018
* Sat Jan 27 16:57:53 PST 2018
* clean up test
* Sat Jan 27 18:01:15 PST 2018
* Sat Jan 27 18:02:54 PST 2018
* Sat Jan 27 18:11:18 PST 2018
* Sat Jan 27 18:11:55 PST 2018
* Sat Jan 27 18:14:09 PST 2018
* review
* try out a ppo example
* some tweaks to ppo example
* add postprocess hook
* Sun Jan 28 15:00:40 PST 2018
* clean up custom explore fn
* Sun Jan 28 15:10:21 PST 2018
* Sun Jan 28 15:14:53 PST 2018
* Sun Jan 28 15:17:04 PST 2018
* Sun Jan 28 15:33:13 PST 2018
* Sun Jan 28 15:56:40 PST 2018
* Sun Jan 28 15:57:36 PST 2018
* Sun Jan 28 16:00:35 PST 2018
* Sun Jan 28 16:02:58 PST 2018
* Sun Jan 28 16:29:50 PST 2018
* Sun Jan 28 16:30:36 PST 2018
* Sun Jan 28 16:31:44 PST 2018
* improve tune doc
* concepts
* update humanoid
* Fri Feb 2 18:03:33 PST 2018
* fix example
* show error file
Remove rllib dep: trainable is now a standalone abstract class that can be easily subclassed.
Clean up hyperband: fix debug string and add an example.
Remove YAML api / ScriptRunner: this was never really used.
Move ray.init() out of run_experiments(): This provides greater flexibility and should be less confusing since there isn't an implicit init() done there. Note that this is a breaking API change for tune.
* wip
* Sat Dec 30 15:07:28 PST 2017
* log video
* video doesn't work well
* scenario integration
* Sat Dec 30 17:30:22 PST 2017
* Sat Dec 30 17:31:05 PST 2017
* Sat Dec 30 17:31:32 PST 2017
* Sat Dec 30 17:32:16 PST 2017
* Sat Dec 30 17:34:11 PST 2017
* Sat Dec 30 17:34:50 PST 2017
* Sat Dec 30 17:35:34 PST 2017
* Sat Dec 30 17:38:49 PST 2017
* Sat Dec 30 17:40:39 PST 2017
* Sat Dec 30 17:43:00 PST 2017
* Sat Dec 30 17:43:04 PST 2017
* Sat Dec 30 17:45:56 PST 2017
* Sat Dec 30 17:46:26 PST 2017
* Sat Dec 30 17:47:02 PST 2017
* Sat Dec 30 17:51:53 PST 2017
* Sat Dec 30 17:52:54 PST 2017
* Sat Dec 30 17:56:43 PST 2017
* Sat Dec 30 18:27:07 PST 2017
* Sat Dec 30 18:27:52 PST 2017
* fix train
* Sat Dec 30 18:41:51 PST 2017
* Sat Dec 30 18:54:11 PST 2017
* Sat Dec 30 18:56:22 PST 2017
* Sat Dec 30 19:05:04 PST 2017
* Sat Dec 30 19:05:23 PST 2017
* Sat Dec 30 19:11:53 PST 2017
* Sat Dec 30 19:14:31 PST 2017
* Sat Dec 30 19:16:20 PST 2017
* Sat Dec 30 19:18:05 PST 2017
* Sat Dec 30 19:18:45 PST 2017
* Sat Dec 30 19:22:44 PST 2017
* Sat Dec 30 19:24:41 PST 2017
* Sat Dec 30 19:26:57 PST 2017
* Sat Dec 30 19:40:37 PST 2017
* wip models
* reward bonus
* test prep
* Sun Dec 31 18:45:25 PST 2017
* Sun Dec 31 18:58:28 PST 2017
* Sun Dec 31 18:59:34 PST 2017
* Sun Dec 31 19:03:33 PST 2017
* Sun Dec 31 19:05:05 PST 2017
* Sun Dec 31 19:09:25 PST 2017
* fix train
* kill
* add tuple preprocessor
* Sun Dec 31 20:38:33 PST 2017
* Sun Dec 31 22:51:24 PST 2017
* Sun Dec 31 23:14:13 PST 2017
* Sun Dec 31 23:16:04 PST 2017
* Mon Jan 1 00:08:35 PST 2018
* Mon Jan 1 00:10:48 PST 2018
* Mon Jan 1 01:08:31 PST 2018
* Mon Jan 1 14:45:44 PST 2018
* Mon Jan 1 14:54:56 PST 2018
* Mon Jan 1 17:29:29 PST 2018
* switch to euclidean dists
* Mon Jan 1 17:39:27 PST 2018
* Mon Jan 1 17:41:47 PST 2018
* Mon Jan 1 17:44:18 PST 2018
* Mon Jan 1 17:47:09 PST 2018
* Mon Jan 1 20:31:02 PST 2018
* Mon Jan 1 20:39:33 PST 2018
* Mon Jan 1 20:40:55 PST 2018
* Mon Jan 1 20:55:06 PST 2018
* Mon Jan 1 21:05:52 PST 2018
* fix env path
* merge richards fix
* fix hash
* Mon Jan 1 22:04:00 PST 2018
* Mon Jan 1 22:25:29 PST 2018
* Mon Jan 1 22:30:42 PST 2018
* simplified reward function
* add framestack
* add env configs
* simplify speed reward
* Tue Jan 2 17:36:15 PST 2018
* Tue Jan 2 17:49:16 PST 2018
* Tue Jan 2 18:10:38 PST 2018
* add lane keeping simple mode
* Tue Jan 2 20:25:26 PST 2018
* Tue Jan 2 20:30:30 PST 2018
* Tue Jan 2 20:33:26 PST 2018
* Tue Jan 2 20:41:42 PST 2018
* ppo lane keep
* simplify discrete actions
* Tue Jan 2 21:41:05 PST 2018
* Tue Jan 2 21:49:03 PST 2018
* Tue Jan 2 22:12:23 PST 2018
* Tue Jan 2 22:14:42 PST 2018
* Tue Jan 2 22:20:59 PST 2018
* Tue Jan 2 22:23:43 PST 2018
* Tue Jan 2 22:26:27 PST 2018
* Tue Jan 2 22:27:20 PST 2018
* Tue Jan 2 22:44:00 PST 2018
* Tue Jan 2 22:57:58 PST 2018
* Tue Jan 2 23:08:51 PST 2018
* Tue Jan 2 23:11:32 PST 2018
* update dqn reward
* Thu Jan 4 12:29:40 PST 2018
* Thu Jan 4 12:30:26 PST 2018
* Update train_dqn.py
* fix
* docs
* Update README.rst
* Sat Dec 30 15:23:49 PST 2017
* comments
* Sun Dec 31 23:33:30 PST 2017
* Sun Dec 31 23:33:38 PST 2017
* Sun Dec 31 23:37:46 PST 2017
* Sun Dec 31 23:39:28 PST 2017
* Sun Dec 31 23:43:05 PST 2017
* Sun Dec 31 23:51:55 PST 2017
* Sun Dec 31 23:52:51 PST 2017
* fix yaml bug
* add ext agent
* gpus
* update
* tuning
* docs
* Sun Oct 15 21:09:25 PDT 2017
* lint
* update
* Sun Oct 15 22:39:55 PDT 2017
* Sun Oct 15 22:40:17 PDT 2017
* Sun Oct 15 22:43:06 PDT 2017
* Sun Oct 15 22:46:06 PDT 2017
* Sun Oct 15 22:46:21 PDT 2017
* Sun Oct 15 22:48:11 PDT 2017
* Sun Oct 15 22:48:44 PDT 2017
* Sun Oct 15 22:49:23 PDT 2017
* Sun Oct 15 22:50:21 PDT 2017
* Sun Oct 15 22:53:00 PDT 2017
* Sun Oct 15 22:53:34 PDT 2017
* Sun Oct 15 22:54:33 PDT 2017
* Sun Oct 15 22:54:50 PDT 2017
* Sun Oct 15 22:55:20 PDT 2017
* Sun Oct 15 22:56:56 PDT 2017
* Sun Oct 15 22:59:03 PDT 2017
* fix
* Update tune_mnist_ray.py
* remove script trial
* fix
* reorder
* fix ex
* py2 support
* upd
* comments
* comments
* cleanup readme
* fix trial
* annotate
* Update rllib.rst