Eric Liang
db0dee573e
[rllib] Q-Mix implementation (Q-Mix, VDN, IQN, and Ape-X variants) ( #3548 )
2018-12-18 10:40:01 -08:00
Eric Liang
f0df97db6f
[rllib] example and docs on how to use parametric actions with DQN / PG algorithms ( #3384 )
2018-11-27 23:35:19 -08:00
Eric Liang
8b76bab25c
[rllib] docs for td3 ( #3381 )
...
* td3 doc
* Update rllib-env.rst
2018-11-22 13:36:47 -08:00
Eric Liang
abdc3b592e
[rllib] Update multi-gpu impala numbers ( #3327 )
2018-11-19 20:55:27 -08:00
Eric Liang
a9e454f6fd
[rllib] Include config dicts in the sphinx docs ( #3064 )
2018-10-16 15:55:11 -07:00
Eric Liang
3c891c6ece
[rllib] Parallel-data loading and multi-gpu support for IMPALA ( #2766 )
2018-10-15 11:02:50 -07:00
Eric Liang
b06c604a51
[rllib] Add some more tuned atari results to documentation ( #2991 )
...
* dqn results ++
* add scale
* hour
* fix
* small dqn table
* update
* steps
* upd
* apex
* up
* add apex results
* tip
2018-09-29 23:13:36 -07:00
Eric Liang
69d1354016
[rllib] Document ARS & rainbow ( #2744 )
...
* wip
* rainbow doc too
* e not used
* fix ppo doc
* clean list
* use same title
2018-08-28 18:13:36 -07:00
Eric Liang
aa014af85b
[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala ( #2700 )
...
Changes needed to reproduce Atari plots in IMPALA / A2C: https://github.com/ray-project/rl-experiments
2018-08-23 17:49:10 -07:00
Eric Liang
fbe6c59f72
[rllib] Misc fixes, A2C ( #2679 )
...
A bunch of minor rllib fixes:
pull in latest baselines atari wrapper changes (and use deepmind wrapper by default)
move reward clipping to policy evaluator
add a2c variant of a3c
reduce vision network fc layer size to 256 units
switch to 84x84 images
doc tweaks
print timesteps in tune status
2018-08-20 15:28:03 -07:00
Eric Liang
f7ec292360
[rllib] Support agent.get_action in multiagent ( #2543 )
...
* support get action on policy id
* comment
* grammar fixes
* Update rllib-algorithms.rst
2018-08-02 13:35:53 -07:00
Eric Liang
9ea57c2a93
[rllib] Basic IMPALA implementation (using deepmind's reference vtrace.py) ( #2504 )
...
Rename AsyncSamplesOptimizer -> AsyncReplayOptimizer
Add AsyncSamplesOptimizer that implements the IMPALA architecture
integrate V-trace with a3c policy graph
audit V-trace integration
benchmark compare vs A3C and with V-trace on/off
PongNoFrameskip-v4 on IMPALA scaling from 16 to 128 workers, solving Pong in <10 min. For reference, solving this env takes ~40 minutes for Ape-X and several hours for A3C.
2018-08-01 20:53:53 -07:00
Eric Liang
8aa56c12e6
[rllib] Document "v2" APIs ( #2316 )
...
* re
* wip
* wip
* a3c working
* torch support
* pg works
* lint
* rm v2
* consumer id
* clean up pg
* clean up more
* fix python 2.7
* tf session management
* docs
* dqn wip
* fix compile
* dqn
* apex runs
* up
* impotrs
* ddpg
* quotes
* fix tests
* fix last r
* fix tests
* lint
* pass checkpoint restore
* kwar
* nits
* policy graph
* fix yapf
* com
* class
* pyt
* vectorization
* update
* test cpe
* unit test
* fix ddpg2
* changes
* wip
* args
* faster test
* common
* fix
* add alg option
* batch mode and policy serving
* multi serving test
* todo
* wip
* serving test
* doc async env
* num envs
* comments
* thread
* remove init hook
* update
* fix ppo
* comments1
* fix
* updates
* add jenkins tests
* fix
* fix pytorch
* fix
* fixes
* fix a3c policy
* fix squeeze
* fix trunc on apex
* fix squeezing for real
* update
* remove horizon test for now
* multiagent wip
* update
* fix race condition
* fix ma
* t
* doc
* st
* wip
* example
* wip
* working
* cartpole
* wip
* batch wip
* fix bug
* make other_batches None default
* working
* debug
* nit
* warn
* comments
* fix ppo
* fix obs filter
* update
* wip
* tf
* update
* fix
* cleanup
* cleanup
* spacing
* model
* fix
* dqn
* fix ddpg
* doc
* keep names
* update
* fix
* com
* docs
* clarify model outputs
* Update torch_policy_graph.py
* fix obs filter
* pass thru worker index
* fix
* rename
* vlad torch comments
* fix log action
* debug name
* fix lstm
* remove unused ddpg net
* remove conv net
* revert lstm
* wip
* wip
* cast
* wip
* works
* fix a3c
* works
* lstm util test
* doc
* clean up
* update
* fix lstm check
* move to end
* fix sphinx
* fix cmd
* remove bad doc
* envs
* vec
* doc prep
* models
* rl
* alg
* up
* clarify
* copy
* async sa
* fix
* comments
* fix a3c conf
* tune lstm
* fix reshape
* fix
* back to 16
* tuned a3c update
* update
* tuned
* optional
* merge
* wip
* fix up
* move pg class
* rename env
* wip
* update
* tip
* alg
* readme
* fix catalog
* readme
* doc
* context
* remove prep
* comma
* add env
* link to paper
* paper
* update
* rnn
* update
* wip
* clean up ev creation
* fix
* fix
* fix
* fix lint
* up
* no comma
* ma
* Update run_multi_node_tests.sh
* fix
* sphinx is stupid
* sphinx is stupid
* clarify torch graph
* no horizon
* fix config
* sb
* Update test_optimizers.py
2018-07-01 00:05:08 -07:00