Eric Liang
5f430da180
[rllib] Provide internal access to episode state in compute_actions() and allow returning extra batches ( #2559 )
...
The goal of this PR is to allow custom policies to perform model-based rollouts. In the multi-agent setting, this requires access to not only policies of other agents, but also their current observations.
Also, you might want to return the model-based trajectories as part of the rollout for efficiency.
compute_actions() now takes a new keyword arg episodes
pull out internal episode class into a top-level file
add function to return extra trajectories from an episode that will be appended to the sample batch
documentation
2018-08-16 14:37:21 -07:00
Sergey Kolesnikov
05490b8cb9
[rllib] dqn/ddpg policy customization ( #2445 )
...
* dqn policy update - more customization
* docs for custom DQN graph
* Update rllib-training.rst
* Update rllib-models.rst
* Update rllib.rst
* Update rllib-training.rst
* Update rllib-concepts.rst
* yapf codestyle
2018-07-22 14:47:14 -07:00
Eric Liang
d24f19fd1e
[rllib] Fix stats collection and some docs bugs since the refactoring ( #2361 )
...
* fix
* fix pbt example
* fix
* fix
* single thread by default
* vec
* fix
* fix
2018-07-07 13:29:20 -07:00
Eric Liang
8aa56c12e6
[rllib] Document "v2" APIs ( #2316 )
...
* re
* wip
* wip
* a3c working
* torch support
* pg works
* lint
* rm v2
* consumer id
* clean up pg
* clean up more
* fix python 2.7
* tf session management
* docs
* dqn wip
* fix compile
* dqn
* apex runs
* up
* impotrs
* ddpg
* quotes
* fix tests
* fix last r
* fix tests
* lint
* pass checkpoint restore
* kwar
* nits
* policy graph
* fix yapf
* com
* class
* pyt
* vectorization
* update
* test cpe
* unit test
* fix ddpg2
* changes
* wip
* args
* faster test
* common
* fix
* add alg option
* batch mode and policy serving
* multi serving test
* todo
* wip
* serving test
* doc async env
* num envs
* comments
* thread
* remove init hook
* update
* fix ppo
* comments1
* fix
* updates
* add jenkins tests
* fix
* fix pytorch
* fix
* fixes
* fix a3c policy
* fix squeeze
* fix trunc on apex
* fix squeezing for real
* update
* remove horizon test for now
* multiagent wip
* update
* fix race condition
* fix ma
* t
* doc
* st
* wip
* example
* wip
* working
* cartpole
* wip
* batch wip
* fix bug
* make other_batches None default
* working
* debug
* nit
* warn
* comments
* fix ppo
* fix obs filter
* update
* wip
* tf
* update
* fix
* cleanup
* cleanup
* spacing
* model
* fix
* dqn
* fix ddpg
* doc
* keep names
* update
* fix
* com
* docs
* clarify model outputs
* Update torch_policy_graph.py
* fix obs filter
* pass thru worker index
* fix
* rename
* vlad torch comments
* fix log action
* debug name
* fix lstm
* remove unused ddpg net
* remove conv net
* revert lstm
* wip
* wip
* cast
* wip
* works
* fix a3c
* works
* lstm util test
* doc
* clean up
* update
* fix lstm check
* move to end
* fix sphinx
* fix cmd
* remove bad doc
* envs
* vec
* doc prep
* models
* rl
* alg
* up
* clarify
* copy
* async sa
* fix
* comments
* fix a3c conf
* tune lstm
* fix reshape
* fix
* back to 16
* tuned a3c update
* update
* tuned
* optional
* merge
* wip
* fix up
* move pg class
* rename env
* wip
* update
* tip
* alg
* readme
* fix catalog
* readme
* doc
* context
* remove prep
* comma
* add env
* link to paper
* paper
* update
* rnn
* update
* wip
* clean up ev creation
* fix
* fix
* fix
* fix lint
* up
* no comma
* ma
* Update run_multi_node_tests.sh
* fix
* sphinx is stupid
* sphinx is stupid
* clarify torch graph
* no horizon
* fix config
* sb
* Update test_optimizers.py
2018-07-01 00:05:08 -07:00