Rohan Potdar
|
5a70b732e8
|
[RLlib] MARWIL and BC Config. (#24853)
|
2022-05-21 12:50:20 +02:00 |
|
kourosh hakhamaneshi
|
3815e52a61
|
[RLlib] Agents to algos: DQN w/o Apex and R2D2, DDPG/TD3, SAC, SlateQ, QMIX, PG, Bandits (#24896)
|
2022-05-19 18:30:42 +02:00 |
|
Artur Niederfahrenhorst
|
86bc9ecce2
|
[RLlib] DDPG Training iteration fn & Replay Buffer API (#24212)
|
2022-05-05 09:41:38 +02:00 |
|
Sven Mika
|
7cca7782f1
|
[RLlib] OPE (off policy estimator) API. (#24384)
|
2022-05-02 21:15:50 +02:00 |
|
Sven Mika
|
296e2ebc46
|
[RLlib] Issue 24082: WorkerSet.policies_to_train (deprecated) - if still used - returns wrong values. (#24386)
|
2022-05-02 18:33:52 +02:00 |
|
Jun Gong
|
ec636dcb29
|
[RLlib] Do not print warning message during env pre-checking, if there is nothing wrong with user envs. (#24289)
|
2022-04-29 10:41:19 +02:00 |
|
Pavel C
|
de0c6f6132
|
[RLlib] Fix policy_map always loading all policies from disk due to (not always needed) global_vars update. (#22010)
|
2022-04-29 10:38:05 +02:00 |
|
simonsays1980
|
ff575eeafc
|
[RLlib] Make actions sent by RLlib to the env immutable. (#24262)
|
2022-04-29 10:27:06 +02:00 |
|
Sven Mika
|
627b9f2e88
|
[RLlib] QMIX training iteration function and new replay buffer API. (#24164)
|
2022-04-27 14:24:20 +02:00 |
|
Noon van der Silk
|
3589c21924
|
[RLlib] Fix some missing f-strings and a f-string related bug in tf eager policy. (#24148)
|
2022-04-25 11:25:28 +02:00 |
|
Avnish Narayan
|
3bf907bcf8
|
[RLlib] Don't modify environments via the env checker utilities. (#24083)
|
2022-04-22 18:39:47 +02:00 |
|
Sven Mika
|
92781c603e
|
[RLlib] A2C training_iteration method implementation (_disable_execution_plan_api=True ) (#23735)
|
2022-04-15 18:36:13 +02:00 |
|
Sven Mika
|
a8494742a3
|
[RLlib] Memory leak finding toolset using tracemalloc + CI memory leak tests. (#15412)
|
2022-04-12 07:50:09 +02:00 |
|
Steven Morad
|
00922817b6
|
[RLlib] Rewrite PPO to use training_iteration + enable DD-PPO for Win32. (#23673)
|
2022-04-11 08:39:10 +02:00 |
|
Sven Mika
|
c82f6c62c8
|
[RLlib] Make RolloutWorkers (optionally) recoverable after failure. (#23739)
|
2022-04-08 15:33:28 +02:00 |
|
Sven Mika
|
434265edd0
|
[RLlib] Examples folder: All training_iteration translations. (#23712)
|
2022-04-05 16:33:50 +02:00 |
|
Jun Gong
|
a7e5aa8c6a
|
[RLlib] Delete some unused confusing logics. (#23513)
|
2022-03-29 13:45:13 +02:00 |
|
Max Pumperla
|
60054995e6
|
[docs] fix doctests and activate CI (#23418)
|
2022-03-24 17:04:02 -07:00 |
|
Siyuan (Ryans) Zhuang
|
0c74ecad12
|
[Lint] Cleanup incorrectly formatted strings (Part 1: RLLib). (#23128)
|
2022-03-15 17:34:21 +01:00 |
|
Sven Mika
|
0af100ffae
|
[RLlib] Fix tree.flatten dict ordering bug: flatten_space([obs_space]) should produce same struct as tree.flatten([obs]) . (#22731)
|
2022-03-01 21:24:24 +01:00 |
|
Sven Mika
|
18c269c70e
|
[RLlib] Issue 22539: agent_key not deleted from 2 dicts in simple list collector. (#22587)
|
2022-02-24 11:58:34 +01:00 |
|
Avnish Narayan
|
740def0a13
|
[RLlib] Put env-checker on critical path. (#22191)
|
2022-02-17 14:06:14 +01:00 |
|
Sven Mika
|
04a5c72ea3
|
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" (#18708)
|
2022-02-10 13:44:22 +01:00 |
|
Sven Mika
|
44d09c2aa5
|
[RLlib] Filter.clear_buffer() deprecated (use Filter.reset_buffer() instead). (#22246)
|
2022-02-10 02:58:43 +01:00 |
|
Alex Wu
|
b122f093c1
|
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test." (#22250)
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
|
2022-02-09 09:26:36 -08:00 |
|
Balaji Veeramani
|
31ed9e5d02
|
[CI] Replace YAPF disables with Black disables (#21982)
|
2022-02-08 16:29:25 -08:00 |
|
Sven Mika
|
ac3e6ab411
|
[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan ) and re-instate Pong learning test. (#22126)
|
2022-02-08 19:04:13 +01:00 |
|
Sven Mika
|
8b678ddd68
|
[RLlib] Issue 22036: Client should handle concurrent episodes with one being training_enabled=False . (#22076)
|
2022-02-06 12:35:03 +01:00 |
|
Sven Mika
|
f6617506a2
|
[RLlib] Add on_sub_environment_created to DefaultCallbacks class. (#21893)
|
2022-02-04 22:22:47 +01:00 |
|
Rodrigo de Lazcano
|
a258f9c692
|
[RLlib] Neural-MMO keep_per_episode_custom_metrics patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042)
|
2022-02-02 17:28:42 +01:00 |
|
Jun Gong
|
87fe033f7b
|
[RLlib] Request CPU resources in Trainer.default_resource_request() if using dataset input. (#21948)
|
2022-02-02 10:20:37 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
371fbb17e4
|
[RLlib] Make policies_to_train more flexible via callable option. (#20735)
|
2022-01-27 12:17:34 +01:00 |
|
Jun Gong
|
099c170ab4
|
[RLlib] Dataset Reader/Writer for RLlib (#21808)
|
2022-01-26 16:00:46 +01:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Avnish Narayan
|
12b087acb8
|
[RLlib] Base env pre-checker. (#21569)
|
2022-01-18 16:34:06 +01:00 |
|
Sven Mika
|
90c6b10498
|
[RLlib] Decentralized multi-agent learning; PR #01 (#21421)
|
2022-01-13 10:52:55 +01:00 |
|
Sven Mika
|
92f030331e
|
[RLlib] Initial code/comment cleanups in preparation for decentralized multi-agent learner. (#21420)
|
2022-01-10 11:22:55 +01:00 |
|
Sven Mika
|
b10d5533be
|
[RLlib] Issue 20920 (partial solution): contrib/MADDPG + pettingzoo coop-pong-v4 not working. (#21452)
|
2022-01-10 11:19:40 +01:00 |
|
Sven Mika
|
62dbf26394
|
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984)
|
2021-12-21 08:39:05 +01:00 |
|
Alexis DUBURCQ
|
6c3e63bc9c
|
[RLlib] Fix view requirements. (#21043)
|
2021-12-15 11:59:04 +01:00 |
|
Sven Mika
|
daa4304a91
|
[RLlib] Switch off preprocessors by default for PGTrainer. (#21008)
|
2021-12-13 12:04:23 +01:00 |
|
Sven Mika
|
596c8e2772
|
[RLlib] Experimental no-flatten option for actions/prev-actions. (#20918)
|
2021-12-11 14:57:58 +01:00 |
|
Tomasz Wrona
|
39c202fa66
|
[RLlib] Allow extra keys in info in multi-agent (#20793)
|
2021-12-09 14:44:33 +01:00 |
|
Avnish Narayan
|
b8c64480d8
|
[RLlib] Change return type of try_reset to MultiEnvDict (#20868)
|
2021-12-06 14:15:33 +01:00 |
|
Sven Mika
|
60b2219d72
|
[RLlib] Allow for evaluation to run by timesteps (alternative to episodes ) and add auto-setting to make sure train doesn't ever have to wait for eval (e.g. long episodes) to finish. (#20757)
|
2021-12-04 13:26:33 +01:00 |
|
Amog Kamsetty
|
611bfc1352
|
[ML] Move find_free_port to ml_utils (#20828)
Small refactoring of common utility used by Train, Tune, and Rllib.
|
2021-12-03 13:38:42 -08:00 |
|
Avnish Narayan
|
74dd0e4085
|
[RLlib] Make to_base_env() a method of all RLlib-supported Env classes (#20811)
|
2021-12-01 09:01:02 +01:00 |
|
Avnish Narayan
|
3ddc09544d
|
[rllib] Env to base env refactor (#20785)
|
2021-11-30 17:02:10 -08:00 |
|
Sven Mika
|
56619b955e
|
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. (#20250)
|
2021-11-17 21:40:16 +01:00 |
|