* Add base for Soft Actor-Critic
* Pick changes from old SAC branch
* Update sac.py
* First implementation of sac model
* Remove unnecessary SAC imports
* Prune unnecessary noise and exploration code
* Implement SAC model and use that in SAC policy
* runs but doesn't learn
* clear state
* fix batch size
* Add missing alpha grads and vars
* -200 by 2k timesteps
* doc
* lazy squash
* one file
* ignore tfp
* revert done
Stress testing for Jenkins.
<!--
Thank you for your contribution!
Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request.
-->
<!-- Please give a short brief about these changes. -->
TODO:
- [x] Enable a common keypair for autoscaling
- [x] Add automatic timeouts?
- [x] Switch out key pair one last time before merge
This PR introduces single-node fault tolerance for Tune.
## Previous behavior:
- Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources.
## New behavior:
- RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available).
- If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued.
- During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running.
Remaining questions:
- Should `last_result` be consistent during restore?
Yes; but not for earlier trials (trials that are yet to be checkpointed).
- Waiting for some PRs to merge first (#3239)
Closes#2851.
* Use F.softmax instead of a pointless network layer
Stateless functions should not be network layers.
* Use correct pytorch functions
* Rename argument name to out_size
Matches in_size and makes more sense.
* Fix shapes of tensors
Advantages and rewards both should be scalars, and therefore a list of them
should be 1D.
* Fmt
* replace deprecated function
* rm unnecessary Variable wrapper
* rm all use of torch Variables
Torch does this for us now.
* Ensure that values are flat list
* Fix shape error in conv nets
* fmt
* Fix shape errors
Reshaping the action before stepping in the env fixes a few errors.
* Add TODO
* Use correct filter size
Works when `self.config['model']['channel_major'] = True`.
* Add missing channel major
* Revert reshape of action
This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.
* Squeeze action
* Squeeze actions along first dimension
This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).
* try adding pytorch tests
* typo
* fixup docker messages
* Fix A3C for some envs
Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).
* fmt
* nit flake
* small lint
* Add shell script for building parquet
* Use parquet ci script; remove anaconda
* Remove gcc flag, use default
* add boost_root
* Fix $TP_DIR reference issue
* fix the PR
* check out specific parquet-cpp commit
* Bring cloudpickle version 0.5.2 inside the repo.
* Use internal copy of cloudpickle everywhere.
* Fix linting.
* Import ordering.
* Change __init__.py.
* Set pickler in serialization context.
* Don't check ray location.
* trying to fix jenkins tests
* comment out more tests
* remove pytorch stuff
* use non-monotonic clock (monotonic not supported on python 2.7)
* whitespace