Artur Niederfahrenhorst
d76ef9add5
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. ( #24923 )
2022-05-24 14:39:43 +02:00
Artur Niederfahrenhorst
fb2915d26a
[RLlib] Replay Buffer API and Ape-X. ( #24506 )
2022-05-17 13:43:49 +02:00
Sven Mika
25001f6d8d
[RLlib] APPO Training iteration fn. ( #24545 )
2022-05-17 10:31:07 +02:00
Max Pumperla
6a6c58b5b4
[RLlib] Config objects for DDPG and SimpleQ. ( #24339 )
2022-05-12 16:12:42 +02:00
Avnish Narayan
f2bb6f6806
[RLlib] Impala training iteration fn ( #23454 )
2022-05-05 16:11:08 +02:00
Sven Mika
924adcf402
[RLlib] Issue 24074: multi-GPU learner thread key error in MA-scenarios. ( #24382 )
2022-05-02 18:30:46 +02:00
Sven Mika
f066180ed5
[RLlib] Deprecate timesteps_per_iteration
config key (in favor of min_[sample|train]_timesteps_per_reporting
. ( #24372 )
2022-05-02 12:51:14 +02:00
Sven Mika
bb4e5cb70a
[RLlib] CQL: training iteration function. ( #24166 )
2022-04-26 14:28:39 +02:00
Avnish Narayan
477b9d22d2
[RLlib][Training iteration fn] APEX conversion ( #22937 )
2022-04-20 17:56:18 +02:00
Artur Niederfahrenhorst
e57ce7efd6
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. ( #23420 )
2022-04-18 12:20:12 +02:00
Steven Morad
00922817b6
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. ( #23673 )
2022-04-11 08:39:10 +02:00
Sven Mika
434265edd0
[RLlib] Examples folder: All training_iteration
translations. ( #23712 )
2022-04-05 16:33:50 +02:00
simonsays1980
d2a3948845
[RLlib] Removed the sampler()
function in the ParallelRollouts() as it is no needed. ( #22320 )
2022-03-31 09:06:30 +02:00
Max Pumperla
60054995e6
[docs] fix doctests and activate CI ( #23418 )
2022-03-24 17:04:02 -07:00
Siyuan (Ryans) Zhuang
0c74ecad12
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). ( #23128 )
2022-03-15 17:34:21 +01:00
Jun Gong
22bc451102
[RLlib] Fix a memeory leak in SimpleReplyBuffer that completely kills sampling throughput ( #22678 )
2022-02-28 09:28:04 +01:00
Sven Mika
6522935291
[RLlib] Slate-Q tf implementation and tests/benchmarks. ( #22389 )
2022-02-22 09:36:44 +01:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
Rodrigo de Lazcano
a258f9c692
[RLlib] Neural-MMO keep_per_episode_custom_metrics
patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). ( #22042 )
2022-02-02 17:28:42 +01:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Sven Mika
ee41800c16
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02 . ( #21649 )
2022-01-27 22:07:05 +01:00
Jun Gong
8ebc50f844
[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. ( #21855 )
2022-01-27 20:08:58 +01:00
Sven Mika
371fbb17e4
[RLlib] Make policies_to_train
more flexible via callable option. ( #20735 )
2022-01-27 12:17:34 +01:00
Sven Mika
d5bfb7b7da
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 ( #21652 )
2022-01-25 14:16:58 +01:00
Sven Mika
c4636c7c05
[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. ( #21665 )
2022-01-20 11:46:25 +01:00
Vince Jankovics
7dc3de4eed
[RLlib] Fix config mismatch for train_one_step. num_sgd_iter instead of sgd_num_iter. ( #21555 )
2022-01-18 16:00:27 +01:00
Sven Mika
90c6b10498
[RLlib] Decentralized multi-agent learning; PR #01 ( #21421 )
2022-01-13 10:52:55 +01:00
Sven Mika
f94bd99ce4
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. ( #21448 )
2022-01-11 19:50:03 +01:00
Sven Mika
92f030331e
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. ( #21420 )
2022-01-10 11:22:55 +01:00
Sven Mika
853d10871c
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. ( #21376 )
2022-01-05 18:22:33 +01:00
Sven Mika
62dbf26394
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). ( #20984 )
2021-12-21 08:39:05 +01:00
Sven Mika
db058d0fb3
[RLlib] Rename metrics_smoothing_episodes
into metrics_num_episodes_for_smoothing
for clarity. ( #20983 )
2021-12-11 20:33:35 +01:00
Sven Mika
49cd7ea6f9
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer()
). ( #20571 )
2021-11-23 23:01:05 +01:00
gjoliver
e7f9e8ceec
[RLlib] Report total_train_steps correctly for offline agents like CQL. ( #20541 )
...
* Fix trainer timestep reporting for offline agents like CQL.
* wip.
* extend timesteps_total to 200K for learning_tests_pendulum_cql test
Co-authored-by: sven1977 <svenmika1977@gmail.com>
2021-11-22 21:46:45 +01:00
Artur Niederfahrenhorst
d07e50e957
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers
dir) ( #20552 )
2021-11-19 11:57:37 +01:00
Sven Mika
a931076f59
[RLlib] Tf2 + eager-tracing same speed as framework=tf; Add more test coverage for tf2+tracing. ( #19981 )
2021-11-05 16:10:00 +01:00
Sven Mika
0b308719f8
[RLlib; Docs overhaul] Docstring cleanup: rllib/utils ( #19829 )
2021-11-01 21:46:02 +01:00
gjoliver
9226f9bddc
[RLlib] Report timesteps_this_iter to Tune, so it can track/checkpoint/restore total timesteps trained. ( #19264 )
...
* Report timesteps_this_iter to Tune, so it can track/checkpoint/restore
total timesteps trained.
* Trigger Build
* lint
2021-10-12 16:03:41 +02:00
Sven Mika
c3e3fc7637
[RLlib] Issue 18280: A3C/IMPALA multi-agent not working. ( #19100 )
2021-10-07 23:57:53 +02:00
Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. ( #18879 )
2021-09-30 16:39:05 +02:00
Sven Mika
05a55a9335
[RLlib] Issue 18668: Unity3D env client/server example not working (fix + add to test cases). ( #18942 )
2021-09-30 08:30:20 +02:00
Sven Mika
61a1274619
[RLlib] No Preprocessors (part 2). ( #18468 )
2021-09-23 12:56:45 +02:00
Sven Mika
3803e796ff
[RLlib] Multi-GPU learner thread (IMPALA) error messages/comments/code-cleanup. ( #18540 )
2021-09-13 19:27:53 +02:00
Sven Mika
1520c3d147
[RLlib] Deepcopy env_ctx for vectorized sub-envs AND add eval-worker-option to Trainer.add_policy()
( #18428 )
2021-09-09 07:10:06 +02:00
Sven Mika
4888d7c9af
[RLlib] Replay buffers: Add config option to store contents in checkpoints. ( #17999 )
2021-08-31 12:21:49 +02:00
Chris Bamford
58a73821fb
[RLlib] IMPALA sample throughput calculation and full queue slowdown fixes ( #17822 )
2021-08-17 14:01:41 +02:00
Sven Mika
924f11cd45
[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). ( #17371 )
2021-08-03 11:35:49 -04:00
Sven Mika
5a313ba3d6
[RLlib] Refactor: All tf static graph code should reside inside Policy class. ( #17169 )
2021-07-20 14:58:13 -04:00