ray/rllib/tuned_examples/invertedpendulum-td3.yaml

invertedpendulum-td3:
    # This is a TD3 with stopping conditions and network size tuned specifically
    # for InvertedPendulum. Should be able to reach 1,000 reward (the maximum
    # achievable) in 10,000 to 20,000 steps.
    env: InvertedPendulum-v2
    run: TD3
    stop:
        episode_reward_mean: 9999.9
        time_total_s: 900 # 15 minutes
        timesteps_total: 1000000
    config:
        # === Model ===
        actor_hiddens: [32, 32]
        critic_hiddens: [32, 32]

        # === Exploration ===
        learning_starts: 1000
        pure_exploration_steps: 1000

        # === Evaluation ===
        evaluation_interval: 1
        evaluation_num_episodes: 5
[rllib] TD3/DDPG improvements and MuJoCo benchmarks (#4694) * [rllib] Separate optimisers for DDPG actor & crit. * [rllib] Better names for DDPG variables & options Config changes: - noise_scale -> exploration_ou_noise_scale - exploration_theta -> exploration_ou_theta - exploration_sigma -> exploration_ou_sigma - act_noise -> exploration_gaussian_sigma - noise_clip -> target_noise_clip * [rllib] Make DDPG less class-y Used functions to replace three classes with only an __init__ method & a handful of unrelated attributes. * [rllib] Refactor DDPG noise * [rllib] Unify DDPG exploration annealing Added option "exploration_should_anneal" to enable linear annealing of exploration noise. By default this is off, for consistency with DDPG & TD3 papers. Also renamed "exploration_final_eps" to "exploration_final_scale" (that name seems to have been carried over from DQN, and doesn't really make sense here). Finally, tried to rename "eps" to "noise_scale" wherever possible. 2019-04-26 17:49:53 -07:00			`invertedpendulum-td3:`
			`# This is a TD3 with stopping conditions and network size tuned specifically`
			`# for InvertedPendulum. Should be able to reach 1,000 reward (the maximum`
			`# achievable) in 10,000 to 20,000 steps.`
			`env: InvertedPendulum-v2`
			`run: TD3`
			`stop:`
			`episode_reward_mean: 9999.9`
			`time_total_s: 900 # 15 minutes`
			`timesteps_total: 1000000`
			`config:`
			`# === Model ===`
			`actor_hiddens: [32, 32]`
			`critic_hiddens: [32, 32]`

			`# === Exploration ===`
			`learning_starts: 1000`
			`pure_exploration_steps: 1000`

			`# === Evaluation ===`
			`evaluation_interval: 1`
			`evaluation_num_episodes: 5`