* Add base for Soft Actor-Critic
* Pick changes from old SAC branch
* Update sac.py
* First implementation of sac model
* Remove unnecessary SAC imports
* Prune unnecessary noise and exploration code
* Implement SAC model and use that in SAC policy
* runs but doesn't learn
* clear state
* fix batch size
* Add missing alpha grads and vars
* -200 by 2k timesteps
* doc
* lazy squash
* one file
* ignore tfp
* revert done
* Change the log syncing behavior
* fix up abstractions for syncer
* Finished checkpoint syncing
* Code
* Set of changes to get things running
* Fixes for log syncing
* Fix parts
* Lint and other fixes
* fix some test
* Remove extra parsing functionality
* some test fixes
* Fix up cloud syncing
* Another thing to do
* Fix up tests and local sync
Changes LogSync into a mixin, and adds tests for different
functionalities.
* Fix up tests, start on local migration
* fix distributed migrations
* comments
* formatting
* Better checkpoint directory handling
* fix tests
* fix tests
* fix click
* comments
* formatting comments
* formatting and comments
* sync function deprecations
* syncfunction
* Add documentation for Syncing and Uploading
* nit
* BaseSyncer as base for Mixin in edge case
* more docs
* clean up assertions
* validate
* nit
* Update test_cluster.py
* betterdoc
* Update tune-usage.rst
* cleanup
* nit
* Instructions for running Tensorboard without sudo
When we run Tensorboard to visualize the results of Ray outputs on multi-user clusters where we don't have sudo access, such as RISE clusters, a few commands need to first be run to make sure tensorboard can edit the tmp directory. This is a pretty common usecase so I figured we may as well put it in the documentation for Tune.
* Update tune-usage.rst
* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.
* Documentation update
* Fix tests.
* Fix grammar
* [rllib] Separate optimisers for DDPG actor & crit.
* [rllib] Better names for DDPG variables & options
Config changes:
- noise_scale -> exploration_ou_noise_scale
- exploration_theta -> exploration_ou_theta
- exploration_sigma -> exploration_ou_sigma
- act_noise -> exploration_gaussian_sigma
- noise_clip -> target_noise_clip
* [rllib] Make DDPG less class-y
Used functions to replace three classes with only an __init__ method & a
handful of unrelated attributes.
* [rllib] Refactor DDPG noise
* [rllib] Unify DDPG exploration annealing
Added option "exploration_should_anneal" to enable linear annealing of
exploration noise. By default this is off, for consistency with DDPG &
TD3 papers. Also renamed "exploration_final_eps" to
"exploration_final_scale" (that name seems to have been carried over
from DQN, and doesn't really make sense here). Finally, tried to rename
"eps" to "noise_scale" wherever possible.