simonsays1980
8627f44d7f
[RLlib] Remove duplicate code block: Config deprecation check for metrics_smoothing_episodes
( #22152 )
2022-03-09 16:51:42 +01:00
Jun Gong
e8be45065e
[RLlib] Restore policies on eval_workers
as well. ( #22641 )
2022-03-01 08:38:14 +01:00
Jun Gong
2b6a0c71d7
[RLlib] Add a callback for when trainer finishes initialization: on_trainer_init
. ( #22493 )
2022-02-22 08:18:32 +01:00
Avnish Narayan
740def0a13
[RLlib] Put env-checker on critical path. ( #22191 )
2022-02-17 14:06:14 +01:00
Sven Mika
5ca6a56e16
[RLlib] Bug fix: eval-workers in offline RL setup have no env, even though eval_config includes env key. ( #22350 )
2022-02-15 09:32:43 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Ishant Mrinal
f0d8b6d701
[RLlib] Fix compute_actions() for Trainer due to missing if prev_actions/rewards is not None checks. ( #22078 )
2022-02-09 09:05:26 +01:00
Balaji Veeramani
31ed9e5d02
[CI] Replace YAPF disables with Black disables ( #21982 )
2022-02-08 16:29:25 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Sven Mika
c17a44cdfa
Revert "Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni…" ( #22153 )
2022-02-08 16:43:00 +01:00
SangBin Cho
a887763b38
Revert "[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learni… ( #22105 )
...
This reverts commit 3f03ef8ba8
.
2022-02-04 00:54:50 -08:00
Sven Mika
3f03ef8ba8
[RLlib] AlphaStar: Parallelized, multi-agent/multi-GPU learning via league-based self-play. ( #21356 )
2022-02-03 09:32:09 +01:00
Rodrigo de Lazcano
a258f9c692
[RLlib] Neural-MMO keep_per_episode_custom_metrics
patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). ( #22042 )
2022-02-02 17:28:42 +01:00
Jun Gong
87fe033f7b
[RLlib] Request CPU resources in Trainer.default_resource_request()
if using dataset input. ( #21948 )
2022-02-02 10:20:37 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Jun Gong
099c170ab4
[RLlib] Dataset Reader/Writer for RLlib ( #21808 )
2022-01-26 16:00:46 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
c4636c7c05
[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. ( #21665 )
2022-01-20 11:46:25 +01:00
Jun Gong
7517aefe05
[RLlib] Bring back BC and Marwil learning tests. ( #21574 )
2022-01-14 14:35:32 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 ( #21421 )
2022-01-13 10:52:55 +01:00
Sven Mika
188324c5c7
[RLlib] Issue 21552: unsquash_action
and clip_action
(when None) cause wrong actions computed by Trainer.compute_single_action
. ( #21553 )
2022-01-12 18:56:51 +01:00
Matti Picus
ec6a33b736
[tune] fixes to allow tune/tests/test_commands.py to run on windows ( #21342 )
...
tune does not run smoothly on Windows. This cleans up some blockers:
- use cross-platform shutils.get_terminal_size instead of Popen(stty)
- somehow Trainer.workers is None at the end of test_commands.py, so the cleanup command was erroring. The error was not fatal, but was printing in the logs.
- if run locally, the log files are all written to the same location, so the rync-based syncing solution is not needed. This is the real fix for issue #20747
2022-01-11 15:57:20 -08:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. ( #21448 )
2022-01-11 19:50:03 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. ( #21420 )
2022-01-10 11:22:55 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. ( #21376 )
2022-01-05 18:22:33 +01:00
Sven Mika
9e6b871739
[RLlib] Better utils for flattening complex inputs and enable prev-actions for LSTM/attention for complex action spaces. ( #21330 )
2022-01-05 11:29:44 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). ( #20984 )
2021-12-21 08:39:05 +01:00
Jun Gong
767f78eaf8
[RLlib] Always attach latest eval metrics. ( #21011 )
2021-12-15 11:42:53 +01:00
Sven Mika
db058d0fb3
[RLlib] Rename metrics_smoothing_episodes
into metrics_num_episodes_for_smoothing
for clarity. ( #20983 )
2021-12-11 20:33:35 +01:00
Sven Mika
596c8e2772
[RLlib] Experimental no-flatten option for actions/prev-actions. ( #20918 )
2021-12-11 14:57:58 +01:00
Sven Mika
f814c2af89
[RLlib; Docs] Docs API reference pages: rllib/execution
, rllib/evaluation
, rllib/models
, rllib/offline
. ( #20538 )
2021-12-10 09:41:29 +01:00
Sven Mika
b4790900f5
[RLlib] Sub-class Trainer
(instead of build_trainer()
): All remaining classes; soft-deprecate build_trainer
. ( #20725 )
2021-12-04 22:05:26 +01:00
Sven Mika
60b2219d72
[RLlib] Allow for evaluation to run by timesteps
(alternative to episodes
) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. ( #20757 )
2021-12-04 13:26:33 +01:00
Sven Mika
9e38f6f613
[RLlib] Trainer sub-class DDPG/TD3/APEX-DDPG (instead of build_trainer
). ( #20636 )
2021-12-01 10:52:12 +01:00
Sven Mika
3d2e27485b
[RLlib] Trainer sub-class DQN/SimpleQ/APEX-DQN/R2D2 (instead of using build_trainer
). ( #20633 )
2021-11-30 18:05:44 +01:00
Sven Mika
c07d8c4c22
[RLlib] Trainer sub-class A2C/A3C (instead of build_trainer
). ( #20635 )
2021-11-24 22:07:13 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()
). ( #20571 )
2021-11-23 23:01:05 +01:00
Sven Mika
9d2fe5756c
[RLlib] Trainer sub-class for APPO (instead of using build_trainer()
). ( #20424 )
2021-11-22 22:14:21 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
Sven Mika
56619b955e
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. ( #20250 )
2021-11-17 21:40:16 +01:00
Avnish Narayan
dc17f0a241
Add error messages for missing tf and torch imports ( #20205 )
...
Co-authored-by: Sven Mika <sven@anyscale.io>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 16:30:53 -08:00
Kai Fricke
3e6ba5d6d2
Revert "Revert [RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
." ( #20285 )
...
* Revert "Revert "[RLlib] POC: `PGTrainer` class that works by sub-classing, not `trainer_template.py`. (#20055 )" (#20284 )"
This reverts commit 246787cdd9
.
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-16 12:26:47 +01:00
Kai Fricke
246787cdd9
Revert "[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )" ( #20284 )
...
This reverts commit 6f85af435f
.
2021-11-12 13:09:43 +00:00
Sven Mika
6f85af435f
[RLlib] POC: PGTrainer
class that works by sub-classing, not trainer_template.py
. ( #20055 )
2021-11-11 12:16:20 +01:00
Sven Mika
ebd56b57db
[RLlib; documentation] "RLlib in 60sec" overhaul. ( #20215 )
2021-11-10 22:20:06 +01:00
Kai Fricke
9c2b8c8501
[tune] Deprecate DurableTrainable ( #19880 )
2021-11-08 20:56:07 +00:00
Sven Mika
eea6b40a3e
[RLlib] Minor cleanups in Trainer
; better tf/tf2 info messages about possible tracing speedups. ( #20109 )
2021-11-08 15:37:27 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. ( #19981 )
2021-11-05 16:10:00 +01:00