* Fix QMix, SAC, and MADDPA too.
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.
Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20
* Reformatting
* Fixing tests
* Move atari-py install conditional to req.txt
* migrate to new ale install method
* Fix QMix, SAC, and MADDPA too.
* Unpin gym and deprecate pendulum v0
Many tests in rllib depended on pendulum v0,
however in gym 0.21, pendulum v0 was deprecated
in favor of pendulum v1. This may change reward
thresholds, so will have to potentially rerun
all of the pendulum v1 benchmarks, or use another
environment in favor. The same applies to frozen
lake v0 and frozen lake v1
Lastly, all of the RLlib tests and have
been moved to python 3.7
* Add gym installation based on python version.
Pin python<= 3.6 to gym 0.19 due to install
issues with atari roms in gym 0.20
Move atari-py install conditional to req.txt
migrate to new ale install method
Make parametric_actions_cartpole return float32 actions/obs
Adding type conversions if obs/actions don't match space
Add utils to make elements match gym space dtypes
Co-authored-by: Jun Gong <jungong@anyscale.com>
Co-authored-by: sven1977 <svenmika1977@gmail.com>
* Add sample example
* Copy relevant lines of ask from inherited Optimizer
* Ignore strategy
* Additional changes
* Add DragonflySearch for tune connector for Dragonfly
* Add example and fix small errors
* lint
* Remove skopt references
* Update example based off of Dragonfly changes
* Edit example for final Dragonfly edits
* Formatting and documentation edits
* Add documentation and add to test pipeline
* Address PR comments
* Fix Jenkins test
* Adjust Dragonfly to PR#7366
* Lint
* fix_tests
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Add base for Soft Actor-Critic
* Pick changes from old SAC branch
* Update sac.py
* First implementation of sac model
* Remove unnecessary SAC imports
* Prune unnecessary noise and exploration code
* Implement SAC model and use that in SAC policy
* runs but doesn't learn
* clear state
* fix batch size
* Add missing alpha grads and vars
* -200 by 2k timesteps
* doc
* lazy squash
* one file
* ignore tfp
* revert done
This PR introduces single-node fault tolerance for Tune.
## Previous behavior:
- Actors will be restarted without checking if resources are available. This can lead to problems if we lose resources.
## New behavior:
- RUNNING trials will be resumed on another node on a best effort basis (meaning they will run if resources available).
- If the cluster is saturated, RUNNING trials on that failed node will become PENDING and queued.
- During recovery, TrialSchedulers and SearchAlgorithms should receive notification of this (via `trial_runner.stop_trial`) so that they don’t wait/block for a trial that isn’t running.
Remaining questions:
- Should `last_result` be consistent during restore?
Yes; but not for earlier trials (trials that are yet to be checkpointed).
- Waiting for some PRs to merge first (#3239)
Closes#2851.
* Use F.softmax instead of a pointless network layer
Stateless functions should not be network layers.
* Use correct pytorch functions
* Rename argument name to out_size
Matches in_size and makes more sense.
* Fix shapes of tensors
Advantages and rewards both should be scalars, and therefore a list of them
should be 1D.
* Fmt
* replace deprecated function
* rm unnecessary Variable wrapper
* rm all use of torch Variables
Torch does this for us now.
* Ensure that values are flat list
* Fix shape error in conv nets
* fmt
* Fix shape errors
Reshaping the action before stepping in the env fixes a few errors.
* Add TODO
* Use correct filter size
Works when `self.config['model']['channel_major'] = True`.
* Add missing channel major
* Revert reshape of action
This should be handled by the agent or at least in a cleaner way that doesn't
break existing envs.
* Squeeze action
* Squeeze actions along first dimension
This should deal with some cases such as cartpole where actions are scalars
while leaving alone cases where actions are arrays (some robotics tasks).
* try adding pytorch tests
* typo
* fixup docker messages
* Fix A3C for some envs
Pendulum doesn't work since it's an edge case (expects singleton arrays, which
`.squeeze()` collapses to scalars).
* fmt
* nit flake
* small lint
* trying to fix jenkins tests
* comment out more tests
* remove pytorch stuff
* use non-monotonic clock (monotonic not supported on python 2.7)
* whitespace
* Test example applications in Jenkins.
* Fix default upload_dir argument for Algorithm class.
* Fix evolution strategies.
* Comment out policy gradient example which doesn't seem to work.
* Set --env-name for evolution strategies.
* attempt to build on travis using docker
* run tests in foreground
* add examples to travis tests
* test from current checkout
* attempt to fix docker version issues
* try build with xenial
* attempt docker upgrade
* avoid hang on configuration files
* matrix osx and linux w/ docker
* restore non-test docker builds
* fix typo
* tuning and cleanup
* add missing file
* comment cleanup
* Ray with Docker
* cleanup based on comments
* rename docker user to ray-user
* add examples docker image
* working toward reliable Docker devel image
* adjust ray-user uid for Linux builds on AWS
* update documentation
* reduced dependencies for examples
* updated Docker documentation
* experimental notice on developing with Docker