ray/release/xgboost_tests/xgboost_tests.yaml
SangBin Cho b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00

104 lines
2.2 KiB
YAML

- name: train_small
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_small.yaml
run:
use_connect: True
autosuspend_mins: 10
timeout: 600
prepare: python wait_cluster.py 4 600
script: python workloads/train_small.py
- name: train_moderate
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_moderate.yaml
run:
timeout: 600
prepare: python wait_cluster.py 32 600
script: python workloads/train_moderate.py
- name: train_gpu
team: ml
cluster:
app_config: app_config_gpu.yaml
compute_template: tpl_gpu_small.yaml
run:
timeout: 600
prepare: python wait_cluster.py 5 600
script: python workloads/train_gpu.py
- name: distributed_api_test
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_small.yaml
results:
run:
timeout: 600
prepare: python wait_cluster.py 4 600
script: python workloads/distributed_api_test.py
results: ""
- name: ft_small_elastic
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_small.yaml
run:
timeout: 900
prepare: python wait_cluster.py 4 600
script: python workloads/ft_small_elastic.py
results: ""
- name: ft_small_non_elastic
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_small.yaml
run:
timeout: 900
prepare: python wait_cluster.py 4 600
script: python workloads/ft_small_non_elastic.py
results: ""
- name: tune_small
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_small.yaml
run:
timeout: 600
prepare: python wait_cluster.py 4 600
script: python workloads/tune_small.py
- name: tune_32x4
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_moderate.yaml
run:
timeout: 900
prepare: python wait_cluster.py 32 600
script: python workloads/tune_32x4.py
- name: tune_4x32
team: ml
cluster:
app_config: app_config.yaml
compute_template: tpl_cpu_moderate.yaml
run:
timeout: 900
prepare: python wait_cluster.py 32 600
script: python workloads/tune_4x32.py