Sven Mika
|
e4ceae19ef
|
[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)
|
2022-06-02 16:47:05 +02:00 |
|
Jun Gong
|
eaf9c941ae
|
[RLlib] Migrate PPO Impala and APPO policies to use sub-classing implementation. (#25117)
|
2022-05-25 14:38:03 +02:00 |
|
Artur Niederfahrenhorst
|
d76ef9add5
|
[RLLib] Fix RNNSAC example failing on CI + fixes for recurrent models for other Q Learning Algos. (#24923)
|
2022-05-24 14:39:43 +02:00 |
|
Artur Niederfahrenhorst
|
fb2915d26a
|
[RLlib] Replay Buffer API and Ape-X. (#24506)
|
2022-05-17 13:43:49 +02:00 |
|
Sven Mika
|
f066180ed5
|
[RLlib] Deprecate timesteps_per_iteration config key (in favor of min_[sample|train]_timesteps_per_reporting . (#24372)
|
2022-05-02 12:51:14 +02:00 |
|
Rodrigo de Lazcano
|
a258f9c692
|
[RLlib] Neural-MMO keep_per_episode_custom_metrics patch (toward making Neuro-MMO RLlib's default massive-multi-agent learning test environment). (#22042)
|
2022-02-02 17:28:42 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
d5bfb7b7da
|
[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 (#21652)
|
2022-01-25 14:16:58 +01:00 |
|
Sven Mika
|
62dbf26394
|
[RLlib] POC: Run PGTrainer w/o the distr. exec API (Trainer's new training_iteration method). (#20984)
|
2021-12-21 08:39:05 +01:00 |
|
Sven Mika
|
db058d0fb3
|
[RLlib] Rename metrics_smoothing_episodes into metrics_num_episodes_for_smoothing for clarity. (#20983)
|
2021-12-11 20:33:35 +01:00 |
|
Artur Niederfahrenhorst
|
d07e50e957
|
[RLlib] Replay buffer API (cleanups; docstrings; renames; move into rllib/execution/buffers dir) (#20552)
|
2021-11-19 11:57:37 +01:00 |
|
gjoliver
|
89fbfc00f8
|
[RLlib] Some minor cleanups (buffer buffer_size -> capacity and others). (#19623)
|
2021-10-25 09:42:39 +02:00 |
|
Sven Mika
|
c3a15ecc0f
|
[RLlib] Issue #13802: Enhance metrics for multiagent->count_steps_by=agent_steps setting. (#14033)
|
2021-03-18 20:27:41 +01:00 |
|
Sven Mika
|
c524f86785
|
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. (#13064)
|
2020-12-27 09:46:03 -05:00 |
|
Sven Mika
|
99c81c6795
|
[RLlib] Attention Net prep PR #3. (#12450)
|
2020-12-07 13:08:17 +01:00 |
|
Sven Mika
|
414041c6dd
|
[RLlib] Do not create env on driver iff num_workers > 0. (#11307)
|
2020-10-15 18:21:30 +02:00 |
|
Sven Mika
|
4fd8977eaf
|
[RLlib] Minor cleanup in preparation to tf2.x support. (#9130)
* WIP.
* Fixes.
* LINT.
* Fixes.
* Fixes and LINT.
* WIP.
|
2020-06-25 19:01:32 +02:00 |
|
Eric Liang
|
9a83908c46
|
[rllib] Deprecate policy optimizers (#8345)
|
2020-05-21 10:16:18 -07:00 |
|
Eric Liang
|
aa7a58e92f
|
[rllib] Support training intensity for dqn / apex (#8396)
|
2020-05-20 11:22:30 -07:00 |
|
Eric Liang
|
9f04a65922
|
[rllib] Add PPO+DQN two trainer multiagent workflow example (#8334)
|
2020-05-07 23:40:29 -07:00 |
|
Eric Liang
|
baadbdf8d4
|
[rllib] Execute PPO using training workflow (#8206)
* wip
* add kl
* kl
* works now
* doc update
* reorg
* add ddppo
* add stats
* fix fetch
* comment
* fix learner stat regression
* test fixes
* fix test
|
2020-04-30 01:18:09 -07:00 |
|
Eric Liang
|
2298f6fb40
|
[rllib] Port DQN/Ape-X to training workflow api (#8077)
|
2020-04-23 12:39:19 -07:00 |
|
Eric Liang
|
31b40b00f6
|
[rllib] Pull out experimental dsl into rllib.execution module, add initial unit tests (#7958)
|
2020-04-10 00:56:08 -07:00 |
|