Sven Mika
|
526fd6b5fb
|
[RLlib] Issue 22444: KL-coeff not stored in persistent policy state. (#22590)
|
2022-02-24 22:05:36 +01:00 |
|
Steven Morad
|
5d52b599aa
|
[RLlib] Fix zero gradients for ppo-clipped vf (#22171)
|
2022-02-15 08:57:18 +01:00 |
|
Balaji Veeramani
|
7f1bacc7dc
|
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
|
2022-01-29 18:41:57 -08:00 |
|
Sven Mika
|
4eaf70942d
|
[RLlib] Issue 21297: Ignore PPO KL-loss term completely if kl-coeff == 0.0 to avoid NaN values due to some discrete action probs==0.0 (#21456)
|
2022-01-10 11:22:40 +01:00 |
|
Sven Mika
|
f82880eda1
|
Revert "Revert [RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061) (#20399)" (#20417)
This reverts commit 90dc5460d4 .
|
2021-11-16 14:49:41 +01:00 |
|
Amog Kamsetty
|
90dc5460d4
|
Revert "[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)" (#20399)
This reverts commit 5b1c8e46e1 .
|
2021-11-15 16:11:35 -08:00 |
|
Sven Mika
|
5b1c8e46e1
|
[RLlib] POC: Deprecate build_policy (policy template) for torch only; PPOTorchPolicy (#20061)
|
2021-11-15 10:41:54 +01:00 |
|
Sven Mika
|
cf21c634a3
|
[RLlib] Fix deprecated warning for torch_ops.py (soft-replaced by torch_utils.py). (#19982)
|
2021-11-03 10:00:46 +01:00 |
|
Sven Mika
|
b4300dd532
|
[RLlib] Issue 18812: Torch multi-GPU stats not protected against race conditions. (#18937)
|
2021-10-04 13:29:00 +02:00 |
|
Sven Mika
|
698b4eeed3
|
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). (#18669)
|
2021-09-21 22:00:14 +02:00 |
|
Sven Mika
|
494ddd98c1
|
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. (#17928)
|
2021-08-21 17:05:48 +02:00 |
|
Sven Mika
|
4b3add0066
|
[RLlib] Discussion 2021: PPO does not learn vf, iff use_gae=False (ignores use_critic setting). (#15610)
|
2021-05-04 14:17:00 +02:00 |
|
Sven Mika
|
cecfc3b43b
|
[RLlib] Multi-GPU support for Torch algorithms. (#14709)
|
2021-04-16 09:16:24 +02:00 |
|
Sven Mika
|
1bb70e4907
|
[RLlib] Issue 14523: Torch + py3.8 leads to GPU device error. (#15014)
|
2021-03-30 21:43:11 +02:00 |
|
Sven Mika
|
04bc0a9828
|
[RLlib] Remove all non-trajectory view API code. (#14860)
|
2021-03-23 09:50:18 -07:00 |
|
Sven Mika
|
69202c6a7d
|
[RLlib] Obsolete usage tracking dict via sample batch. (#13065)
|
2021-03-17 08:18:15 +01:00 |
|
Sven Mika
|
2e3655e8a9
|
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. (#13238)
|
2021-01-19 14:22:36 +01:00 |
|
Sven Mika
|
99ae7bae05
|
[RLlib] JAXPolicy prep. PR #1. (#13077)
|
2020-12-26 20:14:18 -05:00 |
|
Sven Mika
|
d5604eaba3
|
[RLlib] Attention nets PyTorch support and cleanup (using traj. view API). (#12029)
|
2020-12-21 18:38:34 -08:00 |
|
roireshef
|
ef95db51e1
|
[RLlib] Arbitrary input to value() when not using GAE (#12941)
|
2020-12-21 12:19:33 -05:00 |
|
Sven Mika
|
99c81c6795
|
[RLlib] Attention Net prep PR #3. (#12450)
|
2020-12-07 13:08:17 +01:00 |
|
Sven Mika
|
0df55a139c
|
[RLlib] Attention Net prep PR #1: Smaller cleanups. (#12447)
* WIP.
* Fix.
* Fix.
* Fix.
|
2020-11-27 16:25:47 -08:00 |
|
Sven Mika
|
62c7ab5182
|
[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). (#11747)
|
2020-11-12 16:27:34 +01:00 |
|
Sven Mika
|
36bda8432b
|
[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic (#11056)
|
2020-10-01 16:57:10 +02:00 |
|
Sven Mika
|
ef18893fb5
|
[RLlib] PPO, APPO, and DD-PPO code cleanup. (#10420)
|
2020-09-02 14:03:01 +02:00 |
|
Sven Mika
|
e968b52cb7
|
[RLlib] Trajectory view API - 03 Fast LSTM + prev actions/rewards (#9950)
|
2020-08-21 12:35:16 +02:00 |
|
Sven Mika
|
2cbe29a7fa
|
[RLlib] Curiosity minor fixes, do-overs, and testing. (#10143)
|
2020-08-19 17:49:50 +02:00 |
|
Sven Mika
|
57690a3a9f
|
[RLlib] Trajectory view API - 02 actual API scaffold (#9753)
|
2020-08-06 10:54:20 +02:00 |
|
Sven Mika
|
b0b0463161
|
[RLlib] Trajectory View API (preparatory cleanup and enhancements). (#9678)
|
2020-07-29 21:15:09 +02:00 |
|
Sven Mika
|
935d8308fb
|
[RLlib] Issue #9437 (PyTorch converts to CPU tensor, even if on GPU). (#9497)
|
2020-07-16 14:55:50 +02:00 |
|
Tanay Wakhare
|
3536d8e4b3
|
Masking error. With t*valid_mask, we get the error np.inf*0 = np.inf (#9407)
|
2020-07-12 22:59:35 +02:00 |
|
Sven Mika
|
43043ee4d5
|
[RLlib] Tf2x preparation; part 2 (upgrading try_import_tf() ). (#9136)
* WIP.
* Fixes.
* LINT.
* WIP.
* WIP.
* Fixes.
* Fixes.
* Fixes.
* Fixes.
* WIP.
* Fixes.
* Test
* Fix.
* Fixes and LINT.
* Fixes and LINT.
* LINT.
|
2020-06-30 10:13:20 +02:00 |
|
Sven Mika
|
5c6d5d4ab1
|
This PR fixes the currently broken lstm_use_prev_action_reward flag for default lstm models (model.use_lstm=True). (#8970)
|
2020-06-27 20:50:01 +02:00 |
|
Sven Mika
|
7008902cff
|
[RLlib] Minor rllib.utils cleanup. (#8932)
|
2020-06-16 08:52:20 +02:00 |
|
Jan Blumenkamp
|
d6f78f58dc
|
Fix missing learning rate and entropy coeff schedule for torch PPO (#8572)
|
2020-05-23 10:54:18 -07:00 |
|
Sven Mika
|
e153e3179f
|
[RLlib] Exploration API: Policy changes needed for forward pass noisifications. (#7798)
* Rollback.
* WIP.
* WIP.
* LINT.
* WIP.
* Fix.
* Fix.
* Fix.
* LINT.
* Fix (SAC does currently not support eager).
* Fix.
* WIP.
* LINT.
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/evaluation/sampler.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/utils/exploration/exploration.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* WIP.
* WIP.
* Fix.
* LINT.
* LINT.
* Fix and LINT.
* WIP.
* WIP.
* WIP.
* WIP.
* Fix.
* LINT.
* Fix.
* Fix and LINT.
* Update rllib/utils/exploration/exploration.py
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Update rllib/policy/dynamic_tf_policy.py
Co-Authored-By: Eric Liang <ekhliang@gmail.com>
* Fixes.
* LINT.
* WIP.
Co-authored-by: Eric Liang <ekhliang@gmail.com>
|
2020-04-01 00:43:21 -07:00 |
|
Sven Mika
|
66df8b8c35
|
[RLlib] Working/learning example: PPO + torch + LSTM. (#7797)
|
2020-03-31 22:00:28 -07:00 |
|
Sven Mika
|
d8eeb96413
|
Fix issue with torch PPO not handling action spaces of shape=(>1,). (#7398)
|
2020-03-02 10:53:19 -08:00 |
|
Sven Mika
|
e2edca45d4
|
[RLlib] PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
* Take out stats to analyze memory leak in torch PPO.
* WIP
* WIP
* WIP
* WIP
* WIP
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT.
* Fix determine_tests_to_run.py.
* minor change to re-test after determine_tests_to_run.py.
* LINT.
* update comments.
* WIP
* WIP
* WIP
* FIX.
* Fix sequence_mask being dependent on torch being installed.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
* Fix strange ray-core tf-error in test_memory_scheduling test case.
|
2020-02-22 11:02:31 -08:00 |
|
Sven Mika
|
d537e9f0d8
|
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155)
|
2020-02-19 12:18:45 -08:00 |
|
Eric Liang
|
026f6884b5
|
[rllib] Add Decentralized DDPPO trainer and documentation (#7088)
|
2020-02-10 15:28:27 -08:00 |
|
Sven Mika
|
c957ed58ed
|
[RLlib] Implement PPO torch version. (#6826)
|
2020-01-20 23:06:50 -08:00 |
|