hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

Fork 0

mirror of https://github.com/vale981/ray synced 2025-03-10 05:16:49 -04:00

Commit graph

Author	SHA1	Message	Date
Matthew A. Wright	3131e1742d	[rllib] Qmix off by 1 in double Q calculation (#5731 ) * Qmix fix. -Current version of double Q learning is incorrect; it selects actions at timestep t instead of t+1 when computing the t+1 Q value. * Allow extra obs dict keys * Move Q-value-computing replay code to own function * Run the autoformatter * use better terms in comments ("policy" network instead of "live" network)	2019-09-18 18:12:30 -07:00
Eric Liang	5d7afe8092	[rllib] Try moving RLlib to top level dir (#5324 )	2019-08-05 23:25:49 -07:00

Author

SHA1

Message

Date

Matthew A. Wright

3131e1742d

[rllib] Qmix off by 1 in double Q calculation (#5731 )

* Qmix fix.

-Current version of double Q learning is incorrect; it selects actions
at timestep t instead of t+1 when computing the t+1 Q value.

* Allow extra obs dict keys

* Move Q-value-computing replay code to own function

* Run the autoformatter

* use better terms in comments ("policy" network instead of "live" network)

2019-09-18 18:12:30 -07:00

Eric Liang

5d7afe8092

[rllib] Try moving RLlib to top level dir (#5324 )

2019-08-05 23:25:49 -07:00

1 2

52 commits