ray/release/rllib_tests/rllib_tests.yaml

96 lines
2.4 KiB
YAML
Raw Normal View History

# Heavy learning tests (Atari and HalfCheetah) for major algos.
- name: learning_tests
cluster:
app_config: app_config.yaml
compute_template: 8gpus_64cpus.yaml
run:
timeout: 14400
script: python learning_tests/run.py
smoke_test:
run:
timeout: 1200
# 2-GPU learning tests (CartPole and RepeatAfterMeEnv) for major algos.
- name: multi_gpu_learning_tests
cluster:
app_config: app_config.yaml
compute_template: 8gpus_96cpus.yaml
run:
timeout: 7200
script: python multi_gpu_learning_tests/run.py
# 2-GPU learning tests (StatelessCartPole) + use_lstm=True for major algos
# (that support RNN models).
- name: multi_gpu_with_lstm_learning_tests
cluster:
app_config: app_config.yaml
compute_template: 8gpus_96cpus.yaml
run:
timeout: 7200
script: python multi_gpu_with_lstm_learning_tests/run.py
# 2-GPU learning tests (StatelessCartPole) + use_attention=True for major
# algos (that support RNN models).
- name: multi_gpu_with_attention_learning_tests
cluster:
app_config: app_config.yaml
compute_template: 8gpus_96cpus.yaml
run:
timeout: 7200
script: python multi_gpu_with_attention_learning_tests/run.py
# We'll have these as per-PR tests soon.
# - name: example_scripts_on_gpu_tests
# cluster:
# app_config: app_config.yaml
# compute_template: 1gpu_4cpus.yaml
# run:
# timeout: 7200
# script: bash unit_gpu_tests/run.sh
# IMPALA large machine stress tests (4x Atari).
- name: stress_tests
cluster:
app_config: app_config.yaml
compute_template: 4gpus_544_cpus.yaml
run:
timeout: 5400
prepare: python wait_cluster.py 6 600
script: python stress_tests/run_stress_tests.py
smoke_test:
run:
timeout: 2000
[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-03 17:04:27 -07:00
# Tests that exercise auto-scaling and Anyscale connect.
- name: connect_tests
cluster:
app_config: app_config.yaml
compute_template: auto_scale.yaml
run:
use_connect: True
timeout: 3000
script: python connect_tests/run_connect_tests.py
# Nightly performance regression for popular algorithms.
# These algorithms run nightly for pre-determined amount of time without
# passing criteria.
# Performance metrics, such as reward achieved and throughput, are then
# collected and tracked over time.
- name: performance_tests
cluster:
app_config: app_config.yaml
compute_template: 12gpus_192cpus.yaml
run:
timeout: 7200
script: python performance_tests/run.py