mirror of
https://github.com/vale981/ray
synced 2025-03-07 02:51:39 -05:00
![]() * Qmix fix. -Current version of double Q learning is incorrect; it selects actions at timestep t instead of t+1 when computing the t+1 Q value. * Allow extra obs dict keys * Move Q-value-computing replay code to own function * Run the autoformatter * use better terms in comments ("policy" network instead of "live" network) |
||
---|---|---|
.. | ||
__init__.py | ||
apex.py | ||
mixers.py | ||
model.py | ||
qmix.py | ||
qmix_policy.py | ||
README.md |
Code in this package is adapted from https://github.com/oxwhirl/pymarl.