ray/rllib/tuned_examples/ppo/frozenlake-appo-vtrace.yaml
Yi Cheng fd0f967d2e
Revert "[RLlib] Move (A/DD)?PPO and IMPALA algos to algorithms dir and rename policy and trainer classes. (#25346)" (#25420)
This reverts commit e4ceae19ef.

Reverts #25346

linux://python/ray/tests:test_client_library_integration never fail before this PR.

In the CI of the reverted PR, it also fails (https://buildkite.com/ray-project/ray-builders-pr/builds/34079#01812442-c541-4145-af22-2a012655c128). So high likely it's because of this PR.

And test output failure seems related as well (https://buildkite.com/ray-project/ray-builders-branch/builds/7923#018125c2-4812-4ead-a42f-7fddb344105b)
2022-06-02 20:38:44 -07:00

33 lines
827 B
YAML

frozenlake-appo-vtrace:
env: FrozenLake-v1
run: APPO
stop:
episode_reward_mean: 0.99
timesteps_total: 1000000
config:
# Works for both torch and tf.
framework: tf
# Sparse reward environment (short horizon).
env_config:
desc:
- SFFFFFFF
- FFFFFFFF
- FFFFFFFF
- FFFFFFFF
- FFFFFFFF
- FFFFFFFF
- FFFFFFFF
- FFFFFFFG
is_slippery: false
horizon: 20
rollout_fragment_length: 10
batch_mode: complete_episodes
vtrace: true
vtrace_drop_last_ts: false
num_envs_per_worker: 5
num_workers: 4
num_gpus: 0
num_sgd_iter: 1
vf_loss_coeff: 0.01