ray/python
Philipp Moritz 791bee343f [rllib] Implement GAE for PPO (#849)
* make information available for GAE

* buggy version of GAE estimator

* fix

* add more logging and reweight losses

* fix logging

* fix loss

* adapt advantage calculation

* update gae

* standardize returns

* don't normalize td lambda ret

* fix

* don't standardize advantages

* do standardization earlier

* different standardization

* initializer

* drop into the debugger

* fix tensorflow broadcasting bug

* vf clipping

* don't standardize tdlambdaret

* different standardization

* use huber loss for value function

* refactor -- first half

* it runs

* fix

* update

* documentation

* linting and tests

* fix linting

* naming

* fix

* linting

* fix

* remove prefix madness

* fixes

* fix

* add value function example

* fix linting

* remove newline
2017-08-23 20:35:47 -07:00
..
ray [rllib] Implement GAE for PPO (#849) 2017-08-23 20:35:47 -07:00
build-wheel-macos.sh Changes to build to fix creation of wheels. (#840) 2017-08-21 17:49:35 -07:00
build-wheel-manylinux1.sh Changes to build to fix creation of wheels. (#840) 2017-08-21 17:49:35 -07:00
README-building-wheels.md Add script for building MacOS wheels. (#601) 2017-06-01 00:30:46 +00:00
setup.py Changes to build to fix creation of wheels. (#840) 2017-08-21 17:49:35 -07:00