Eric Liang
55d039af32
Annotate datasources and add API annotation check script ( #24999 )
...
Why are these changes needed?
Add API stability annotations for datasource classes, and add a linter to check all data classes have appropriate annotations.
2022-05-21 15:05:07 -07:00
kourosh hakhamaneshi
3815e52a61
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits ( #24896 )
2022-05-19 18:30:42 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. ( #24339 )
2022-05-12 16:12:42 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan
API code no longer needed. ( #24501 )
2022-05-06 12:29:53 +02:00
Sven Mika
5b61a00792
[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). ( #24433 )
2022-05-04 16:28:12 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan
API by default. ( #24165 )
2022-05-03 07:59:26 +02:00
Sven Mika
7cca7782f1
[RLlib] OPE (off policy estimator) API. ( #24384 )
2022-05-02 21:15:50 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00
Sven Mika
b2b1c95aa5
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). ( #24332 )
2022-04-30 09:51:09 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. ( #24164 )
2022-04-27 14:24:20 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. ( #24118 )
...
We had unreported merge conflicts with DDPPO. This PR closes and combines #24092 , #24035 , #24030 and #23096
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )" ( #24103 )
...
This reverts commit a337fd994e
.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert #23906 [RLlib] DD-PPO training iteration function implementation. ( #24035 )
2022-04-21 17:37:49 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion ( #22937 )
2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn ( #23906 )" ( #24030 )
...
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.
We'll need to fix the test then re-merge
Reverts #23906
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn ( #23906 )
2022-04-19 17:55:26 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True ( #23916 )
2022-04-15 13:51:12 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. ( #23673 )
2022-04-11 08:39:10 +02:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. ( #23739 )
2022-04-08 15:33:28 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). ( #23491 )
2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q ( #22842 )
2022-03-29 14:44:40 +02:00
Sven Mika
7cb86acce2
[RLlib] trainer_template.py: hard deprecation (error when used). ( #23488 )
2022-03-25 18:25:51 +01:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI ( #23418 )
2022-03-24 17:04:02 -07:00
Sven Mika
b1cda46681
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes ( #23276 )
2022-03-18 13:45:16 +01:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes
( #22152 )
2022-03-09 16:51:42 +01:00
Jun Gong
e8be45065e
[RLlib] Restore policies on eval_workers
as well. ( #22641 )
2022-03-01 08:38:14 +01:00
Jun Gong
2b6a0c71d7
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init
. ( #22493 )
2022-02-22 08:18:32 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. ( #22191 )
2022-02-17 14:06:14 +01:00
Sven Mika
5ca6a56e16
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. ( #22350 )
2022-02-15 09:32:43 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. ( #22078 )
2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano
a258f9c692
[RLlib] Neural-MMO keep_per_episode_custom_metrics
patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). ( #22042 )
2022-02-02 17:28:42 +01:00
Jun Gong
87fe033f7b
[RLlib] Request CPU resources in Trainer.default_resource_request()
if using dataset input. ( #21948 )
2022-02-02 10:20:37 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Jun Gong
099c170ab4
[RLlib] Dataset Reader/Writer for RLlib ( #21808 )
2022-01-26 16:00:46 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
c4636c7c05
[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. ( #21665 )
2022-01-20 11:46:25 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. ( #21574 )
2022-01-14 14:35:32 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 ( #21421 )
2022-01-13 10:52:55 +01:00
Sven Mika
188324c5c7
[RLlib] Issue 21552: unsquash_action
and clip_action
(when None) cause wrong actions computed by Trainer.compute_single_action
. ( #21553 )
2022-01-12 18:56:51 +01:00