Jun Gong
|
acf2bf9b2f
|
[RLlib] Get rid of all these deprecation warnings. (#27085)
|
2022-07-27 10:48:54 -07:00 |
|
Rohan Potdar
|
38c9e1d52a
|
[RLlib]: Fix OPE trainables (#26279)
Co-authored-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2022-07-17 14:25:53 -07:00 |
|
Jun Gong
|
34d1e580cb
|
[rllib/docs] Minor import doc fix. (#26269)
|
2022-07-02 06:52:38 -07:00 |
|
Sven Mika
|
2b43713785
|
[RLlib] Move IMPALA and APPO back to exec plan (for now; due to unresolved learning/performance issues). (#25851)
|
2022-06-29 08:41:47 +02:00 |
|
Sven Mika
|
762cfbdff1
|
[RLlib] IMPALA and APPO metrics fixes; remove deprecated async_parallel_requests utility. (#26117)
|
2022-06-28 15:14:37 +02:00 |
|
Sven Mika
|
59a967a3a0
|
[RLlib] Cleanup some deprecated metric keys and classes. (#26036)
|
2022-06-23 21:30:01 +02:00 |
|
Sven Mika
|
1499af945b
|
[RLlib] Algorithm step() fixes: evaluation should NOT be part of timed training_step loop. (#25924)
|
2022-06-20 19:53:47 +02:00 |
|
Sven Mika
|
96693055bd
|
[RLlib] More Trainer -> Algorithm renaming cleanups. (#25869)
|
2022-06-20 15:54:00 +02:00 |
|
Artur Niederfahrenhorst
|
a322cc5765
|
[RLlib] IMPALA/APPO multi-agent mix-in-buffer fixes (plus MA learning tests). (#25848)
|
2022-06-17 14:10:36 +02:00 |
|
Yi Cheng
|
7b8b0f8e03
|
Revert "[RLlib] Remove execution plan code no longer used by RLlib. (#25624)" (#25776)
This reverts commit 804719876b .
|
2022-06-14 13:59:15 -07:00 |
|
Avnish Narayan
|
804719876b
|
[RLlib] Remove execution plan code no longer used by RLlib. (#25624)
|
2022-06-14 10:57:27 +02:00 |
|
Sven Mika
|
130b7eeaba
|
[RLlib] Trainer to Algorithm renaming. (#25539)
|
2022-06-11 15:10:39 +02:00 |
|
Sven Mika
|
7c39aa5fac
|
[RLlib] Trainer.training_iteration -> Trainer.training_step; Iterations vs reportings: Clarification of terms. (#25076)
|
2022-06-10 17:09:18 +02:00 |
|
Eric Liang
|
905258dbc1
|
Clean up docstyle in python modules and add LINT rule (#25272)
|
2022-06-01 11:27:54 -07:00 |
|
Avnish Narayan
|
eaed256d68
|
[RLlib] Async parallel execution manager. (#24423)
|
2022-05-25 17:54:08 +02:00 |
|
Artur Niederfahrenhorst
|
d76ef9add5
|
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. (#24923)
|
2022-05-24 14:39:43 +02:00 |
|
Artur Niederfahrenhorst
|
fb2915d26a
|
[RLlib] Replay Buffer API and Ape-X. (#24506)
|
2022-05-17 13:43:49 +02:00 |
|
Sven Mika
|
25001f6d8d
|
[RLlib] APPO Training iteration fn. (#24545)
|
2022-05-17 10:31:07 +02:00 |
|
Max Pumperla
|
6a6c58b5b4
|
[RLlib] Config objects for DDPG and SimpleQ. (#24339)
|
2022-05-12 16:12:42 +02:00 |
|
Avnish Narayan
|
f2bb6f6806
|
[RLlib] Impala training iteration fn (#23454)
|
2022-05-05 16:11:08 +02:00 |
|
Sven Mika
|
924adcf402
|
[RLlib] Issue 24074: multi-GPU learner thread key error in MA-scenarios. (#24382)
|
2022-05-02 18:30:46 +02:00 |
|
Sven Mika
|
f066180ed5
|
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting . (#24372)
|
2022-05-02 12:51:14 +02:00 |
|
Sven Mika
|
bb4e5cb70a
|
[RLlib] CQL: training iteration function. (#24166)
|
2022-04-26 14:28:39 +02:00 |
|
Avnish Narayan
|
477b9d22d2
|
[RLlib][Training iteration fn] APEX conversion (#22937)
|
2022-04-20 17:56:18 +02:00 |
|
Artur Niederfahrenhorst
|
e57ce7efd6
|
[RLlib] Replay Buffer API and Training Iteration Fn for DQN. (#23420)
|
2022-04-18 12:20:12 +02:00 |
|
Steven Morad
|
00922817b6
|
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673)
|
2022-04-11 08:39:10 +02:00 |
|
Sven Mika
|
434265edd0
|
[RLlib] Examples folder: All training_iteration translations. (#23712)
|
2022-04-05 16:33:50 +02:00 |
|
simonsays1980
|
d2a3948845
|
[RLlib] Removed the sampler() function in the ParallelRollouts() as it is no needed. (#22320)
|
2022-03-31 09:06:30 +02:00 |
|
Max Pumperla
|
60054995e6
|
[docs] fix doctests and activate CI (#23418)
|
2022-03-24 17:04:02 -07:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
Jun Gong
|
22bc451102
|
[RLlib] Fix a memeory leak in SimpleReplyBuffer that completely kills sampling throughput (#22678)
|
2022-02-28 09:28:04 +01:00 |
|
Sven Mika
|
6522935291
|
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389)
|
2022-02-22 09:36:44 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Alex Wu
|
b122f093c1
|
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
|
2022-02-09 09:26:36 -08:00 |
|
Sven Mika
|
ac3e6ab411
|
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test. (#22126)
|
2022-02-08 19:04:13 +01:00 |
|
Rodrigo de Lazcano
|
a258f9c692
|
[RLlib] Neural-MMO keep_per_episode_custom_metrics patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042)
|
2022-02-02 17:28:42 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
ee41800c16
|
[RLlib] Preparatory PR for multi-agent, multi-GPU learning agent (alpha-star style) #02. (#21649)
|
2022-01-27 22:07:05 +01:00 |
|
Jun Gong
|
8ebc50f844
|
[RLlib] Issue 21334: Fix APPO when kl_loss is enabled. (#21855)
|
2022-01-27 20:08:58 +01:00 |
|
Sven Mika
|
371fbb17e4
|
[RLlib] Make policies_to_train more flexible via callable option. (#20735)
|
2022-01-27 12:17:34 +01:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Sven Mika
|
c4636c7c05
|
[RLlib] Issue 21633: SimpleQ should not use a prio. replay buffer. (#21665)
|
2022-01-20 11:46:25 +01:00 |
|
Vince Jankovics
|
7dc3de4eed
|
[RLlib] Fix config mismatch for train_one_step. num_sgd_iter instead of sgd_num_iter. (#21555)
|
2022-01-18 16:00:27 +01:00 |
|
Sven Mika
|
90c6b10498
|
[RLlib] Decentralized multi-agent learning; PR #01 (#21421)
|
2022-01-13 10:52:55 +01:00 |
|
Sven Mika
|
f94bd99ce4
|
[RLlib] Issue 21044: Improve error message for "multiagent" dict checks. (#21448)
|
2022-01-11 19:50:03 +01:00 |
|
Sven Mika
|
92f030331e
|
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420)
|
2022-01-10 11:22:55 +01:00 |
|
Sven Mika
|
853d10871c
|
[RLlib] Issue 18499: PGTrainer with training_iteration fn does not support multi-GPU. (#21376)
|
2022-01-05 18:22:33 +01:00 |
|
Sven Mika
|
62dbf26394
|
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984)
|
2021-12-21 08:39:05 +01:00 |
|
Sven Mika
|
db058d0fb3
|
[RLlib] Rename metrics_smoothing_episodes into metrics_num_episodes_for_smoothing for clarity. (#20983)
|
2021-12-11 20:33:35 +01:00 |
|
Sven Mika
|
49cd7ea6f9
|
[RLlib] Trainer sub-class PPO/DDPPO (instead of build_trainer() ). (#20571)
|
2021-11-23 23:01:05 +01:00 |
|