Sven Mika
|
b2bcab711d
|
[RLlib] Attention Nets: tf (#12753)
|
2020-12-20 20:22:32 -05:00 |
|
Sven Mika
|
0df55a139c
|
[RLlib] Attention Net prep PR #1: Smaller cleanups. (#12447)
* WIP.
* Fix.
* Fix.
* Fix.
|
2020-11-27 16:25:47 -08:00 |
|
Sven Mika
|
805dad3bc4
|
[RLlib] SAC algo cleanup. (#10825)
|
2020-09-20 11:27:02 +02:00 |
|
Sven Mika
|
e968b52cb7
|
[RLlib] Trajectory view API - 03 Fast LSTM + prev actions/rewards (#9950)
|
2020-08-21 12:35:16 +02:00 |
|
Sven Mika
|
c9435cad43
|
WIP. (#8456)
Fix multi-GPU histogram metrics for > 0D tensors.
|
2020-05-15 21:43:27 +02:00 |
|
Sven Mika
|
66df8b8c35
|
[RLlib] Working/learning example: PPO + torch + LSTM. (#7797)
|
2020-03-31 22:00:28 -07:00 |
|
Eric Liang
|
9a590ac6a5
|
[rllib] Fix custom model metrics in multi-device case (#7640)
* fix example
* add example test
* lin
|
2020-03-23 12:40:22 -07:00 |
|
Sven Mika
|
d537e9f0d8
|
[RLlib] Exploration API: merge deterministic flag with exploration classes (SoftQ and StochasticSampling). (#7155)
|
2020-02-19 12:18:45 -08:00 |
|
Eric Liang
|
2fb53396ad
|
[rllib] [experimental] Decentralized Distributed PPO for torch (DD-PPO) (#6918)
|
2020-01-25 22:36:43 -08:00 |
|