Sven Mika
ed85f59194
[RLlib] Unify all RLlib Trainer.train() -> results[info][learner][policy ID][learner_stats] and add structure tests. ( #18879 )
2021-09-30 16:39:05 +02:00
Avnish Narayan
6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo ( #18959 )
...
* [RLlib] Raise error for kl penalty ddpo
DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)
* Update rllib/agents/ppo/ddppo.py
Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
Sven Mika
698b4eeed3
[RLlib] POC: Separate losses for APPO/IMPALA. Enable TFPolicy to handle multiple optimizers/losses (like TorchPolicy). ( #18669 )
2021-09-21 22:00:14 +02:00
Sven Mika
45f60e51a9
[RLlib] DDPPO fixes and benchmarks. ( #18390 )
2021-09-08 19:39:01 +02:00
Sven Mika
599e589481
[RLlib] Move existing fake multi-GPU learning tests into separate buildkite job. ( #18065 )
2021-08-31 14:56:53 +02:00
Sven Mika
494ddd98c1
[RLlib] Replace "seq_lens" w/ SampleBatch.SEQ_LENS. ( #17928 )
2021-08-21 17:05:48 +02:00
Sven Mika
a428f10ebe
[RLlib] Add multi-GPU learning tests to nightly. ( #17778 )
2021-08-18 17:21:01 +02:00
Sven Mika
f3bbe4ea44
[RLlib] Test cases/BUILD cleanup; split "everything else" (longest running one rn) tests in 2. ( #17640 )
2021-08-16 22:01:01 +02:00
Sven Mika
5107d16ae5
[RLlib] Add @Deprecated decorator to simplify/unify deprecation of classes, methods, functions. ( #17530 )
2021-08-03 18:30:02 -04:00
Sven Mika
924f11cd45
[RLlib] Torch algos use now-framework-agnostic MultiGPUTrainOneStep execution op (~33% speedup for PPO-torch + GPU). ( #17371 )
2021-08-03 11:35:49 -04:00
Sven Mika
5a313ba3d6
[RLlib] Refactor: All tf static graph code should reside inside Policy class. ( #17169 )
2021-07-20 14:58:13 -04:00
Sven Mika
649580d735
[RLlib] Redo simplify multi agent config dict: Reverted b/c seemed to break test_typing (non RLlib test). ( #17046 )
2021-07-15 05:51:24 -04:00
Amog Kamsetty
38b5b6d24c
Revert "[RLlib] Simplify multiagent config (automatically infer class/spaces/config). ( #16565 )" ( #17036 )
...
This reverts commit e4123fff27
.
2021-07-13 09:57:15 -07:00
Sven Mika
e4123fff27
[RLlib] Simplify multiagent config (automatically infer class/spaces/config). ( #16565 )
2021-07-13 06:38:14 -04:00
Sven Mika
53206dd440
[RLlib] CQL BC loss fixes; PPO/PG/A2|3C action normalization fixes ( #16531 )
2021-06-30 12:32:11 +02:00
Sven Mika
e80095591c
[RLlib] Entropy coeff schedule bug fix and git bisect script. ( #15937 )
2021-05-20 18:15:10 +02:00
Sven Mika
2303851c3c
[RLlib] Torch multi-GPU + LSTM/RNN bug fix. ( #15492 )
2021-05-18 11:51:05 +02:00
Sven Mika
839fc59224
[RLlib] CQL TensorFlow support ( #15841 )
2021-05-18 11:10:46 +02:00
Sven Mika
a495759f06
[RLlib] Discussion 2022: PPO should auto-adjust rollout_fragment_length
if other settings do not align with train_batch_size
. ( #15611 )
2021-05-10 16:16:02 +02:00
Sven Mika
461d73ddf1
[RLlib] simple_optimizer
should not be used by default for tf+MA. ( #15365 )
2021-05-10 16:10:44 +02:00
Sven Mika
c7563a32ed
[RLlib] DD-PPO not supported on Win (add meaningful error message). ( #15631 )
2021-05-04 19:26:17 +02:00
Sven Mika
4b3add0066
[RLlib] Discussion 2021: PPO does not learn vf, iff use_gae=False (ignores use_critic setting). ( #15610 )
2021-05-04 14:17:00 +02:00
Sven Mika
e973b726c2
[RLlib] Support native tf.keras.Models (part 2) - Default keras models for Vision/RNN/Attention. ( #15273 )
2021-04-30 19:26:30 +02:00
Sven Mika
bb8a286cbc
[RLlib] Support native tf.keras.Model (milestone toward obsoleting ModelV2 class). ( #14684 )
2021-04-27 10:44:54 +02:00
Sven Mika
cecfc3b43b
[RLlib] Multi-GPU support for Torch algorithms. ( #14709 )
2021-04-16 09:16:24 +02:00
Sven Mika
bbfa8ffec9
[RLlib] Minor release 1.3 warnings cleanups. ( #15272 )
2021-04-14 14:03:15 +02:00
Sven Mika
1bb70e4907
[RLlib] Issue 14523: Torch + py3.8 leads to GPU device error. ( #15014 )
2021-03-30 21:43:11 +02:00
Sven Mika
4f66309e19
[RLlib] Redo issue 14533 tf enable eager exec ( #14984 )
2021-03-29 20:07:44 +02:00
SangBin Cho
fa5f961d5e
Revert "[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )" ( #14918 )
...
This reverts commit 3e389d5812
.
2021-03-25 00:42:01 -07:00
Sven Mika
3e389d5812
[RLlib] Issue 14533: tf.enable_eager_execution()
must be called at beginning. ( #14737 )
2021-03-24 12:54:27 +01:00
Sven Mika
04bc0a9828
[RLlib] Remove all non-trajectory view API code. ( #14860 )
2021-03-23 09:50:18 -07:00
Sven Mika
69202c6a7d
[RLlib] Obsolete usage tracking dict via sample batch. ( #13065 )
2021-03-17 08:18:15 +01:00
Sven Mika
732197e23a
[RLlib] Multi-GPU for tf-DQN/PG/A2C. ( #13393 )
2021-03-08 15:41:27 +01:00
Sven Mika
8000258333
[RLlib] R2D2 Implementation. ( #13933 )
2021-02-25 12:18:11 +01:00
Sven Mika
775e685531
[RLlib] Issue #13824 : compress_observations=True
crashes for all algos not using a replay buffer. ( #14034 )
2021-02-18 21:36:32 +01:00
Sven Mika
d629292d63
[RLlib] Add grad_clip config option to MARWIL and stabilize grad clipping against inf global_norms. ( #13634 )
2021-01-22 19:36:02 +01:00
Sven Mika
2e3655e8a9
[RLlib] Issue 9071 A3C w/ RNN not working due to VF assuming no RNN. ( #13238 )
2021-01-19 14:22:36 +01:00
Sven Mika
93c0a5549b
[RLlib] Deprecate vf_share_layers
in top-level PPO/MAML/MB-MPO configs. ( #13397 )
2021-01-19 09:51:35 +01:00
Sven Mika
c524f86785
[RLlib] BC/MARWIL/recurrent nets minor cleanups and bug fixes. ( #13064 )
2020-12-27 09:46:03 -05:00
Sven Mika
99ae7bae05
[RLlib] JAXPolicy prep. PR #1 . ( #13077 )
2020-12-26 20:14:18 -05:00
Michael Luo
4bcd475671
[RLlib] Improved Documentation for PPO, DDPG, and SAC ( #12943 )
2020-12-24 09:31:35 -05:00
Sven Mika
d5604eaba3
[RLlib] Attention nets PyTorch support and cleanup (using traj. view API). ( #12029 )
2020-12-21 18:38:34 -08:00
roireshef
ef95db51e1
[RLlib] Arbitrary input to value() when not using GAE ( #12941 )
2020-12-21 12:19:33 -05:00
Sven Mika
b2bcab711d
[RLlib] Attention Nets: tf ( #12753 )
2020-12-20 20:22:32 -05:00
Sven Mika
e40b14d255
[RLlib] Batch-size for truncate_episode batch_mode should be confgurable in agent-steps (rather than env-steps), if needed. ( #12420 )
2020-12-08 16:41:45 -08:00
Sven Mika
99c81c6795
[RLlib] Attention Net prep PR #3 . ( #12450 )
2020-12-07 13:08:17 +01:00
Sven Mika
19c8033df2
[RLlib] Fix most remaining RLlib algos for running with trajectory view API. ( #12366 )
...
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* WIP.
* LINT and fixes.
MB-MPO and MAML not working yet.
* wip
* update
* update
* rmeove
* remove dep
* higher
* Update requirements_rllib.txt
* Update requirements_rllib.txt
* relpos
* no mbmpo
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2020-12-01 17:41:10 -08:00
Sven Mika
0df55a139c
[RLlib] Attention Net prep PR #1 : Smaller cleanups. ( #12447 )
...
* WIP.
* Fix.
* Fix.
* Fix.
2020-11-27 16:25:47 -08:00
Sven Mika
6475297bd3
[RLlib] Torch LR schedule not working. Fix and added test case. ( #12396 )
2020-11-26 13:14:11 +01:00
Sven Mika
592c161032
[RLlib] Issue 12118: LSTM prev-a/r should be separately configurable. Fix missing prev-a one-hot encoding. ( #12397 )
...
* WIP.
* Fix and LINT.
2020-11-25 11:27:46 -08:00