ray/rllib/tuned_examples/regression_tests/pendulum-sac.yaml

pendulum-sac:
    env: Pendulum-v0
    run: SAC
    stop:
        episode_reward_mean: -300  # note that evaluation perf is higher
        timesteps_total: 10000
    config:
        soft_horizon: True
        clip_actions: False
        normalize_actions: True
        metrics_smoothing_episodes: 5
        no_done_at_end: True
[rllib] Feature/soft actor critic v2 (#5328) * Add base for Soft Actor-Critic * Pick changes from old SAC branch * Update sac.py * First implementation of sac model * Remove unnecessary SAC imports * Prune unnecessary noise and exploration code * Implement SAC model and use that in SAC policy * runs but doesn't learn * clear state * fix batch size * Add missing alpha grads and vars * -200 by 2k timesteps * doc * lazy squash * one file * ignore tfp * revert done 2019-08-01 23:37:36 -07:00			`pendulum-sac:`
			`env: Pendulum-v0`
			`run: SAC`
			`stop:`
			`episode_reward_mean: -300 # note that evaluation perf is higher`
SAC Performance Fixes (#6295) * SAC Performance Fixes * Small Changes * Update sac_model.py * fix normalize wrapper * Update test_eager_support.py Co-authored-by: Eric Liang <ekhliang@gmail.com> 2019-12-20 10:51:25 -08:00			`timesteps_total: 10000`
[rllib] Feature/soft actor critic v2 (#5328) * Add base for Soft Actor-Critic * Pick changes from old SAC branch * Update sac.py * First implementation of sac model * Remove unnecessary SAC imports * Prune unnecessary noise and exploration code * Implement SAC model and use that in SAC policy * runs but doesn't learn * clear state * fix batch size * Add missing alpha grads and vars * -200 by 2k timesteps * doc * lazy squash * one file * ignore tfp * revert done 2019-08-01 23:37:36 -07:00			`config:`
SAC Performance Fixes (#6295) * SAC Performance Fixes * Small Changes * Update sac_model.py * fix normalize wrapper * Update test_eager_support.py Co-authored-by: Eric Liang <ekhliang@gmail.com> 2019-12-20 10:51:25 -08:00			`soft_horizon: True`
			`clip_actions: False`
			`normalize_actions: True`
			`metrics_smoothing_episodes: 5`
[rllib] SAC no_done_at_end should default to False (#7594) * update * update doc * stochastic * cleanu 2020-03-14 11:16:54 -07:00			`no_done_at_end: True`