ray/rllib/tuned_examples/slateq/long-term-satisfaction-recsim-env-slateq.yaml

long-term-satisfaction-recsim-env-slateq:
    env: ray.rllib.examples.env.recommender_system_envs_with_recsim.LongTermSatisfactionRecSimEnv
    run: SlateQ
    stop:
        # Random baseline rewards:
        # num_candidates=20; slate_size=2; resample=true: ~951
        # num_candidates=50; slate_size=3; resample=true: ~946
        evaluation/episode_reward_mean: 1000.0
        timesteps_total: 200000
    config:
        # Works for both tf and torch.
        framework: tf

        metrics_num_episodes_for_smoothing: 200

        # RLlib/RecSim wrapper specific settings:
        env_config:
            config:
                # Each step, sample `num_candidates` documents using the env-internal
                # document sampler model (a logic that creates n documents to select
                # the slate from).
                resample_documents: true
                num_candidates: 50
                # How many documents to recommend (out of `num_candidates`) each
                # timestep?
                slate_size: 2
                # Should the action space be purely Discrete? Useful for algos that
                # don't support MultiDiscrete (e.g. DQN or Bandits).
                # SlateQ handles MultiDiscrete action spaces.
                convert_to_discrete_action_space: false
                seed: 42

        exploration_config:
            warmup_timesteps: 10000
            epsilon_timesteps: 60000

        target_network_update_freq: 3200
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`long-term-satisfaction-recsim-env-slateq:`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			`env: ray.rllib.examples.env.recommender_system_envs_with_recsim.LongTermSatisfactionRecSimEnv`
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`run: SlateQ`
			`stop:`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			`# Random baseline rewards:`
			`# num_candidates=20; slate_size=2; resample=true: ~951`
			`# num_candidates=50; slate_size=3; resample=true: ~946`
			`evaluation/episode_reward_mean: 1000.0`
			`timesteps_total: 200000`
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`config:`
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00			`# Works for both tf and torch.`
			`framework: tf`
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00			`metrics_num_episodes_for_smoothing: 200`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`# RLlib/RecSim wrapper specific settings:`
			`env_config:`
			`config:`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			# Each step, sample `num_candidates` documents using the env-internal
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`# document sampler model (a logic that creates n documents to select`
			`# the slate from).`
			`resample_documents: true`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			`num_candidates: 50`
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			# How many documents to recommend (out of `num_candidates`) each
			`# timestep?`
			`slate_size: 2`
			`# Should the action space be purely Discrete? Useful for algos that`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			`# don't support MultiDiscrete (e.g. DQN or Bandits).`
[RLlib] Cleanup SlateQ algo; add test + add target Q-net (#21827) 2022-02-04 17:01:12 +01:00			`# SlateQ handles MultiDiscrete action spaces.`
			`convert_to_discrete_action_space: false`
			`seed: 42`

[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00			`exploration_config:`
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00			`warmup_timesteps: 10000`
			`epsilon_timesteps: 60000`
[RLlib] Slate-Q tf implementation and tests/benchmarks. (#22389) 2022-02-22 09:36:44 +01:00
[RLlib] SlateQ (tf GPU + multi-GPU) + Bandit fixes (#23276) 2022-03-18 13:45:16 +01:00			`target_network_update_freq: 3200`