Eric Liang
|
9a83908c46
|
[rllib] Deprecate policy optimizers (#8345)
|
2020-05-21 10:16:18 -07:00 |
|
Eric Liang
|
77689d1116
|
[rllib] Port remainder of algorithms to build_trainer() pattern (#4920)
|
2019-06-07 16:45:36 -07:00 |
|
Eric Liang
|
37208216ae
|
[rllib] Rename Agent to Trainer (#4556)
|
2019-04-07 00:36:18 -07:00 |
|
Eric Liang
|
aa014af85b
|
[rllib] Fix atari reward calculations, add LR annealing, explained var stat for A2C / impala (#2700)
Changes needed to reproduce Atari plots in IMPALA / A2C: https://github.com/ray-project/rl-experiments
|
2018-08-23 17:49:10 -07:00 |
|
Eric Liang
|
981d9818c1
|
[rllib] Support the timesteps_per_batch in simple optimizer PPO mode (#2558)
* support ts
* doc
* Update sync_samples_optimizer.py
|
2018-08-06 12:10:59 -07:00 |
|
Eric Liang
|
8aa56c12e6
|
[rllib] Document "v2" APIs (#2316)
* re
* wip
* wip
* a3c working
* torch support
* pg works
* lint
* rm v2
* consumer id
* clean up pg
* clean up more
* fix python 2.7
* tf session management
* docs
* dqn wip
* fix compile
* dqn
* apex runs
* up
* impotrs
* ddpg
* quotes
* fix tests
* fix last r
* fix tests
* lint
* pass checkpoint restore
* kwar
* nits
* policy graph
* fix yapf
* com
* class
* pyt
* vectorization
* update
* test cpe
* unit test
* fix ddpg2
* changes
* wip
* args
* faster test
* common
* fix
* add alg option
* batch mode and policy serving
* multi serving test
* todo
* wip
* serving test
* doc async env
* num envs
* comments
* thread
* remove init hook
* update
* fix ppo
* comments1
* fix
* updates
* add jenkins tests
* fix
* fix pytorch
* fix
* fixes
* fix a3c policy
* fix squeeze
* fix trunc on apex
* fix squeezing for real
* update
* remove horizon test for now
* multiagent wip
* update
* fix race condition
* fix ma
* t
* doc
* st
* wip
* example
* wip
* working
* cartpole
* wip
* batch wip
* fix bug
* make other_batches None default
* working
* debug
* nit
* warn
* comments
* fix ppo
* fix obs filter
* update
* wip
* tf
* update
* fix
* cleanup
* cleanup
* spacing
* model
* fix
* dqn
* fix ddpg
* doc
* keep names
* update
* fix
* com
* docs
* clarify model outputs
* Update torch_policy_graph.py
* fix obs filter
* pass thru worker index
* fix
* rename
* vlad torch comments
* fix log action
* debug name
* fix lstm
* remove unused ddpg net
* remove conv net
* revert lstm
* wip
* wip
* cast
* wip
* works
* fix a3c
* works
* lstm util test
* doc
* clean up
* update
* fix lstm check
* move to end
* fix sphinx
* fix cmd
* remove bad doc
* envs
* vec
* doc prep
* models
* rl
* alg
* up
* clarify
* copy
* async sa
* fix
* comments
* fix a3c conf
* tune lstm
* fix reshape
* fix
* back to 16
* tuned a3c update
* update
* tuned
* optional
* merge
* wip
* fix up
* move pg class
* rename env
* wip
* update
* tip
* alg
* readme
* fix catalog
* readme
* doc
* context
* remove prep
* comma
* add env
* link to paper
* paper
* update
* rnn
* update
* wip
* clean up ev creation
* fix
* fix
* fix
* fix lint
* up
* no comma
* ma
* Update run_multi_node_tests.sh
* fix
* sphinx is stupid
* sphinx is stupid
* clarify torch graph
* no horizon
* fix config
* sb
* Update test_optimizers.py
|
2018-07-01 00:05:08 -07:00 |
|