ray/release/tune_tests/scalability_tests/tune_tests.yaml
SangBin Cho b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00

90 lines
1.9 KiB
YAML

- name: bookkeeping_overhead
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_1x16.yaml
run:
timeout: 1200
script: python workloads/test_bookkeeping_overhead.py
- name: durable_trainable
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_16x2.yaml
run:
timeout: 900
prepare: python wait_cluster.py 16 600
script: python workloads/test_durable_trainable.py --bucket data-test-ilr
- name: long_running_large_checkpoints
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_1x32_hd.yaml
run:
timeout: 86400
script: python workloads/test_long_running_large_checkpoints.py
long_running: True
smoke_test:
run:
timeout: 3600
- name: network_overhead
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_100x2.yaml
run:
timeout: 900
prepare_timeout: 1200
prepare: python wait_cluster.py 100 1200
script: python workloads/test_network_overhead.py
smoke_test:
cluster:
compute_template: tpl_20x2.yaml
run:
timeout: 400
prepare_timeout: 600
prepare: python wait_cluster.py 20 600
- name: result_throughput_cluster
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_16x64.yaml
run:
timeout: 600
prepare: python wait_cluster.py 16 600
script: python workloads/test_result_throughput_cluster.py
- name: result_throughput_single_node
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_1x96.yaml
run:
timeout: 600
script: python workloads/test_result_throughput_single_node.py
- name: xgboost_sweep
team: ml
cluster:
app_config: app_config_data.yaml
compute_template: tpl_16x64.yaml
run:
timeout: 3600
prepare: python wait_cluster.py 16 600
script: python workloads/test_xgboost_sweep.py