hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

History

Avnish Narayan 026bf01071 [RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 ) * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 * Reformatting * Fixing tests * Move atari-py install conditional to req.txt * migrate to new ale install method * Fix QMix, SAC, and MADDPA too. * Unpin gym and deprecate pendulum v0 Many tests in rllib depended on pendulum v0, however in gym 0.21, pendulum v0 was deprecated in favor of pendulum v1. This may change reward thresholds, so will have to potentially rerun all of the pendulum v1 benchmarks, or use another environment in favor. The same applies to frozen lake v0 and frozen lake v1 Lastly, all of the RLlib tests and have been moved to python 3.7 * Add gym installation based on python version. Pin python<= 3.6 to gym 0.19 due to install issues with atari roms in gym 0.20 Move atari-py install conditional to req.txt migrate to new ale install method Make parametric_actions_cartpole return float32 actions/obs Adding type conversions if obs/actions don't match space Add utils to make elements match gym space dtypes Co-authored-by: Jun Gong <jungong@anyscale.com> Co-authored-by: sven1977 <svenmika1977@gmail.com>		2021-11-03 16:24:00 +01:00
..
tests	[RLlib] Upgrade gym version to 0.21 and deprecate pendulum-v0. (#19535 )	2021-11-03 16:24:00 +01:00
__init__.py	[RLlib] R2D2 Implementation. (#13933 )	2021-02-25 12:18:11 +01:00
apex.py	[RLlib] Unify the way we create local replay buffer for all agents (#19627 )	2021-10-26 20:56:02 +02:00
distributional_q_tf_model.py	[RLlib] Redo: Make TFModelV2 fully modular like TorchModelV2 (soft-deprecate register_variables, unify var names wrt torch). (#13363 )	2021-01-14 14:44:33 +01:00
dqn.py	[RLlib] Fix all the CI tests that were broken by is_training and replay buffer changes; re-comment-in the failing RLlib tests (#19809 )	2021-10-28 18:06:47 +02:00
dqn_tf_policy.py	[RLlib] Add all simple learning tests as `framework=tf2`. (#19273 )	2021-11-02 12:10:17 +01:00
dqn_torch_model.py	[RLlib] DQN (Rainbow): Fix torch noisy layer support and loss (#16716 )	2021-07-13 16:48:06 -04:00
dqn_torch_policy.py	[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982 )	2021-11-03 10:00:46 +01:00
learner_thread.py	[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829 )	2021-11-01 21:46:02 +01:00
r2d2.py	[RLlib] Fix flakey test_a3c, test_maml, test_apex_dqn. (#19035 )	2021-10-04 13:23:51 +02:00
r2d2_tf_policy.py	[RLlib] Add all simple learning tests as `framework=tf2`. (#19273 )	2021-11-02 12:10:17 +01:00
r2d2_torch_policy.py	[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982 )	2021-11-03 10:00:46 +01:00
README.md	[RLLib] Readme.md Documentation for Almost All Algorithms in rllib/agents (#13035 )	2020-12-29 18:45:55 -05:00
simple_q.py	[RLlib] Unify the way we create local replay buffer for all agents (#19627 )	2021-10-26 20:56:02 +02:00
simple_q_tf_policy.py	[RLlib; Docs overhaul] Docstring cleanup: rllib/utils (#19829 )	2021-11-01 21:46:02 +01:00
simple_q_torch_policy.py	[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982 )	2021-11-03 10:00:46 +01:00

README.md

Deep Q Networks (DQN)

Code in this package is adapted from https://github.com/openai/baselines/tree/master/baselines/deepq.

Overview

DQN is a model-free off-policy RL algorithm and one of the first deep RL algorithms developed. DQN proposes using a neural network as a function approximator for the Q-function in Q-learning. The agent aims to minimize the L2 norm between the Q-value predictions and the Q-value targets, which is computed as 1-step TD. The paper proposes two important concepts, a target network and an experience replay buffer. The target network is a copy of the main Q network and is used to compute Q-value targets for loss-function calculations. To stabilize training, the target network lags slightly behind the main Q-network. Meanwhile, the experience replay stores all data encountered by the agent during training and is uniformly sampled from to generate gradient updates for the Q-value network.

Supported DQN Algorithms

Double DQN - As opposed to learning one Q network in vanilla DQN, Double DQN proposes learning two Q networks akin to double Q-learning. As a solution, Double DQN aims to solve the issue of vanilla DQN's overly-optimistic Q-values, which limits performance.

Dueling DQN - Dueling DQN proposes splitting learning a Q-value function approximator into learning two networks: a value and advantage approximator.

Distributional DQN - Usually, the Q network outputs the predicted Q-value of a state-action pair. Distributional DQN takes this further by predicting the distribution of Q-values (e.g. mean and std of a normal distribution) of a state-action pair. Doing this captures uncertainty of the Q-value and can improve the performance of DQN algorithms.

APEX-DQN - Standard DQN algorithms propose using a experience replay buffer to sample data uniformly and compute gradients from the sampled data. APEX introduces the notion of weighted replay data, where elements in the replay buffer are more or less likely to be sampled depending on the TD-error.

Rainbow - Rainbow DQN, as the word Rainbow suggests, aggregates the many improvements discovered in research to improve DQN performance. This includes a multi-step distributional loss (extended from Distributional DQN), prioritized replay (inspired from APEX-DQN), double Q-networks (inspired from Double DQN), and dueling networks (inspired from Dueling DQN).

Documentation & Implementation:

Vanilla DQN (DQN).

Detailed Documentation

Implementation
Double DQN.

Detailed Documentation

Implementation
Dueling DQN

Detailed Documentation

Implementation
Distributional DQN

Detailed Documentation

Implementation
APEX DQN

Detailed Documentation

Implementation
Rainbow DQN

Detailed Documentation

Implementation