hiro/ray - Forgejo: Beyond coding. We Forge.

4055 commits 227 branches 67 tags 234 MiB

Author	SHA1	Message	Date
Sven Mika	d537e9f0d8	[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155 )	2020-02-19 12:18:45 -08:00
Sven Mika	303547f119	[RLlib] Policy-classes cleanup and torch/tf unification. (#6770 )	2020-01-17 22:26:28 -08:00
Sven	60d4d5e1aa	Remove future imports (#6724 ) * Remove all __future__ imports from RLlib. * Remove (object) again from tf_run_builder.py::TFRunBuilder. * Fix 2xLINT warnings. * Fix broken appo_policy import (must be appo_tf_policy) * Remove future imports from all other ray files (not just RLlib). * Remove future imports from all other ray files (not just RLlib). * Remove future import blocks that contain `unicode_literals` as well. Revert appo_tf_policy.py to appo_policy.py (belongs to another PR). * Add two empty lines before Schedule class. * Put back __future__ imports into determine_tests_to_run.py. Fails otherwise on a py2/print related error.	2020-01-09 00:15:48 -08:00
Matthew A. Wright	0110941de5	rllib: use pytorch's fn to see if gpu is available (#5890 )	2019-10-12 00:13:00 -07:00
Matthew A. Wright	4aa06918ae	Qmix on gpu and with non-stacked-obs environment state support (#5751 )	2019-10-08 13:18:07 -07:00
Matthew A. Wright	3131e1742d	[rllib] Qmix off by 1 in double Q calculation (#5731 ) * Qmix fix. -Current version of double Q learning is incorrect; it selects actions at timestep t instead of t+1 when computing the t+1 Q value. * Allow extra obs dict keys * Move Q-value-computing replay code to own function * Run the autoformatter * use better terms in comments ("policy" network instead of "live" network)	2019-09-18 18:12:30 -07:00
Eric Liang	5d7afe8092	[rllib] Try moving RLlib to top level dir (#5324 )	2019-08-05 23:25:49 -07:00

Renamed from python/ray/rllib/agents/qmix/qmix_policy.py (Browse further)

7 commits