Commit graph

52 commits

Author SHA1 Message Date
Matthew A. Wright
3131e1742d [rllib] Qmix off by 1 in double Q calculation (#5731)
* Qmix fix.

-Current version of double Q learning is incorrect; it selects actions
at timestep t instead of t+1 when computing the t+1 Q value.

* Allow extra obs dict keys

* Move Q-value-computing replay code to own function

* Run the autoformatter

* use better terms in comments ("policy" network instead of "live" network)
2019-09-18 18:12:30 -07:00
Eric Liang
5d7afe8092
[rllib] Try moving RLlib to top level dir (#5324) 2019-08-05 23:25:49 -07:00