1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-13 06:36:39 -04:00
Commit graph

621 commits

Author SHA1 Message Date
kourosh hakhamaneshi
f48f1b252c
[RLlib] Moved agents.es to algorithms.es () 2022-05-06 14:54:22 +02:00
Sven Mika
7ab19ddc32
[RLlib] MADDPG: Move into agents folder (from contrib) and use training_iteration method. () 2022-05-06 12:35:21 +02:00
Sven Mika
f54557073e
[RLlib] Remove execution_plan API code no longer needed. () 2022-05-06 12:29:53 +02:00
Sven Mika
f891a2b6f1
[RLlib] SlateQ + tf; release test fixes, related to TD-error not properly being formatted. () 2022-05-06 08:50:30 +02:00
Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn () 2022-05-05 16:11:08 +02:00
Artur Niederfahrenhorst
86bc9ecce2
[RLlib] DDPG Training iteration fn & Replay Buffer API () 2022-05-05 09:41:38 +02:00
Sven Mika
5b61a00792
[RLlib] Feed all values in COMMON_CONFIG directly from TrainerConfig() (removes duplicate values and comments). () 2022-05-04 16:28:12 +02:00
Sven Mika
b48f63113b
[RLlib] SlateQ fixes: Release learning tests wrong yaml structure + TD-error torch issue () 2022-05-04 13:37:14 +02:00
Sven Mika
1bc6419e0e
[RLlib] R2D2 training iteration fn AND switch off execution_plan API by default. () 2022-05-03 07:59:26 +02:00
Sven Mika
7cca7782f1
[RLlib] OPE (off policy estimator) API. () 2022-05-02 21:15:50 +02:00
Sven Mika
0c5ac3b9e8
[RLlib] Issue 24075: Better error message for Bandit MultiDiscrete (suggest using our wrapper). () 2022-05-02 21:14:08 +02:00
Sven Mika
f53ca1cacb
[RLlib] ES + ARS TrainerConfig objects. () 2022-05-02 16:55:28 +02:00
Sven Mika
026849cd27
[RLlib] APPO TrainerConfig objects. () 2022-05-02 15:06:23 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting. () 2022-05-02 12:51:14 +02:00
Sven Mika
950bd3fc3f
[RLlib] IMPALA TrainerConfig objects. () 2022-05-02 12:05:30 +02:00
Sven Mika
b2b1c95aa5
[RLlib] A2/3C Config objects (A2CConfig and A3CConfig). () 2022-04-30 09:51:09 +02:00
Sven Mika
3052193c9e
[RLlib] Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead). ()
Fix CQL getting stuck when deprecated timesteps_per_iteration is used (use min_train_timesteps_per_reporting instead).

CQL does not perform sampling timesteps and the deprecated timesteps_per_iteration is automatically translated into the new min_sample_timesteps_per_reporting, but should be translated (only for CQL and other purely offline RL algos) into min_train_timesteps_per_reporting.

If timesteps_per_iteration, CQL lever leaves the first iteration as it thinks it's not done yet (sample timesteps always remain at 0).
2022-04-29 21:02:34 +01:00
Sven Mika
539832f2c5
[RLlib] SlateQ training iteration function. () 2022-04-29 18:38:17 +02:00
Xuehai Pan
3c3dd5051f
[RLlib] Fix type hints for original_batches in callbacks. () 2022-04-29 10:33:53 +02:00
Xuehai Pan
9c76e21a5e
[RLlib] Ensure MultiCallbacks always implements all callback methods () 2022-04-29 10:30:24 +02:00
Sven Mika
ba14f0a41b
[RLlib] PGTrainer config object class (PGConfig). () 2022-04-28 22:25:16 +02:00
Sven Mika
6551922c21
[RLlib] Fix AlphaStar for tf2+tracing; smaller cleanups around avoiding to wrap a TFPolicy as_eager() or with_tracing more than once. () 2022-04-28 13:43:21 +02:00
Sven Mika
c95dd79953
[RLlib] APPO eager fix (APPOTFPolicy gets wrapped as_eager() twice by mistake). () 2022-04-27 21:27:34 +02:00
Sven Mika
627b9f2e88
[RLlib] QMIX training iteration function and new replay buffer API. () 2022-04-27 14:24:20 +02:00
Sven Mika
29388fb25b
[RLlib] Reinstate flakey AlphaStar learning CI test (flakey due to 2 changed, bad config default values). () 2022-04-27 14:01:52 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. () 2022-04-26 14:28:39 +02:00
Artur Niederfahrenhorst
f7be409462
[RLlib] Training Iteration Function for SAC () 2022-04-26 12:37:54 +02:00
Fabian Witter
56bc90ca72
[RLlib] Remove Unnecessary List Conversion of Complex Observations in SAC Models (torch and tf). () 2022-04-25 11:21:34 +02:00
Artur Niederfahrenhorst
306853b5b8
[RLlib] Issue 22693: RNN-SAC fixes. () 2022-04-25 09:19:24 +02:00
Ben Kasper
531fdd50d4
[RLlib] Add 2 missing callbacks to MultiCallbacks class (on_trainer_init and on_sub_environment_created) () 2022-04-25 09:18:03 +02:00
Avnish Narayan
6e68b6bef9
[RLlib] DD-PPO training iteration fn. ()
We had unreported merge conflicts with DDPPO. This PR closes and combines , ,  and 

Co-authored-by: sven1977 <svenmika1977@gmail.com>
2022-04-22 15:22:14 -07:00
Kai Fricke
9f7170e444
Revert "Revert revert [RLlib] DD-PPO training iteration function implementation. ()" ()
This reverts commit a337fd994e.
2022-04-22 09:58:58 +01:00
Avnish Narayan
a337fd994e
Revert revert [RLlib] DD-PPO training iteration function implementation. () 2022-04-21 17:37:49 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion () 2022-04-20 17:56:18 +02:00
Avnish Narayan
0ddbce6518
Revert "[RLlib] DD-PPO training iteration fn ()" ()
The DDPPO LR scheduler test is broken because the learner_info_dictionary that is returned by the training iteration function does not consistently return a learner info for every training iteration, but the test expects that it does.

We'll need to fix the test then re-merge

Reverts 
2022-04-19 16:43:57 -07:00
Sven Mika
eb54236d13
[RLlib] DD-PPO training iteration fn () 2022-04-19 17:55:26 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. () 2022-04-18 12:20:12 +02:00
Sven Mika
92781c603e
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True) () 2022-04-15 18:36:13 +02:00
kourosh hakhamaneshi
c38a29573f
[RLlib] Removed deprecated code with error=True () 2022-04-15 13:51:12 +02:00
Sven Mika
a8494742a3
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. () 2022-04-12 07:50:09 +02:00
Jun Gong
c61910487f
[RLlib] Fix typo in docstring of PGTorchPolicy () 2022-04-11 19:31:45 +02:00
Sven Mika
a3d4fc74a6
[RLlib] MARWIL: Move to training_iteration API. () 2022-04-11 19:28:32 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. () 2022-04-11 08:39:10 +02:00
Eric Liang
1ff874e8e8
[spelling] Add linter rule for mis-capitalizations of RLLib -> RLlib () 2022-04-10 16:12:53 -07:00
Sven Mika
c82f6c62c8
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. () 2022-04-08 15:33:28 +02:00
Steven Morad
39841b65b3
[RLlib] PPOTorchPolicy: Remove extra call to model.value_function () 2022-04-05 08:40:29 +02:00
Jiajun Yao
5f37231842
Remove yapf dependency ()
Yapf has been replaced by black.
2022-04-04 21:50:04 -07:00
Sven Mika
0bb82f29b6
[RLlib] AlphaStar polishing (fix logger.info bug). () 2022-04-01 09:49:41 +02:00
Sven Mika
2eaa54bd76
[RLlib] POC: Config objects instead of dicts (PPO only). () 2022-03-31 18:26:12 +02:00
Artur Niederfahrenhorst
9a64bd4e9b
[RLlib] Simple-Q uses training iteration fn (instead of execution_plan); ReplayBuffer API for Simple-Q () 2022-03-29 14:44:40 +02:00