Avnish Narayan
6dc1a6b72f
[RLlib] Raise error for kl penalty ddpo ( #18959 )
...
* [RLlib] Raise error for kl penalty ddpo
DDPPO doesn't support KL penalties like PPO-1.
In order to support KL penalties, DDPPO would need to
become undecentralized, which defeats the purpose of the
algorithm. Users can still tune the entropy coefficient to
control the policy entropy (similar to controlling the KL
penalty.)
* Update rllib/agents/ppo/ddppo.py
Co-authored-by: avnishn <avnishnarayan@gmail.com>
Co-authored-by: Sven Mika <sven@anyscale.io>
2021-09-30 10:56:22 +02:00
Sven Mika
45f60e51a9
[RLlib] DDPPO fixes and benchmarks. ( #18390 )
2021-09-08 19:39:01 +02:00
Sven Mika
c7563a32ed
[RLlib] DD-PPO not supported on Win (add meaningful error message). ( #15631 )
2021-05-04 19:26:17 +02:00
Sven Mika
dab241dcc6
[RLlib] Fix inconsistency wrt batch size in SampleCollector (traj. view API). Makes DD-PPO work with traj. view API. ( #12063 )
2020-11-19 19:01:14 +01:00
Sven Mika
62c7ab5182
[RLlib] Trajectory view API: Enable by default for PPO, IMPALA, PG, A3C (tf and torch). ( #11747 )
2020-11-12 16:27:34 +01:00
Philsik Chang
ede9347127
[rllib] Add torch_distributed_backend flag for DDPPO ( #11362 ) ( #11425 )
2020-10-21 18:30:42 -07:00
Sven Mika
36bda8432b
[RLlib] Trajectory view API: Simple List Collector (on by default for PPO); LSTM-agnostic ( #11056 )
2020-10-01 16:57:10 +02:00
Sven Mika
805dad3bc4
[RLlib] SAC algo cleanup. ( #10825 )
2020-09-20 11:27:02 +02:00
Sven Mika
ef18893fb5
[RLlib] PPO, APPO, and DD-PPO code cleanup. ( #10420 )
2020-09-02 14:03:01 +02:00
Sven Mika
d14b501692
[RLlib] First attempt at cleaning up algo code in RLlib: PG. ( #10115 )
2020-08-20 17:05:57 +02:00
Chua Cheow Huan
ea51e94729
[rllib] Learning rate schedule for DDPPO. ( #10006 )
...
* Get shared metrics, increment counter & set global vars for remote workers.
* Add unit test to test lr_schedule for DDPPO.
* Broadcast the local set of global vars to remote workers instead of independently setting the global vars on each rollout worker.
2020-08-15 00:51:45 -07:00
Sven Mika
2746fc0476
[RLlib] Auto-framework, retire use_pytorch
in favor of framework=...
( #8520 )
2020-05-27 16:19:13 +02:00
Eric Liang
9a83908c46
[rllib] Deprecate policy optimizers ( #8345 )
2020-05-21 10:16:18 -07:00
Eric Liang
9d012626e5
[rllib] Distributed exec workflow for impala ( #8321 )
2020-05-11 20:24:43 -07:00
Eric Liang
baadbdf8d4
[rllib] Execute PPO using training workflow ( #8206 )
...
* wip
* add kl
* kl
* works now
* doc update
* reorg
* add ddppo
* add stats
* fix fetch
* comment
* fix learner stat regression
* test fixes
* fix test
2020-04-30 01:18:09 -07:00
Eric Liang
dd70720578
[rllib] Rename sample_batch_size => rollout_fragment_length ( #7503 )
...
* bulk rename
* deprecation warn
* update doc
* update fig
* line length
* rename
* make pytest comptaible
* fix test
* fi sys
* rename
* wip
* fix more
* lint
* update svg
* comments
* lint
* fix use of batch steps
2020-03-14 12:05:04 -07:00
Eric Liang
026f6884b5
[rllib] Add Decentralized DDPPO trainer and documentation ( #7088 )
2020-02-10 15:28:27 -08:00