mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

This fixes the previous problems from team column revert. This has 2 additional changes; alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289 Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
15 lines
307 B
YAML
15 lines
307 B
YAML
- name: horovod_test
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config_master.yaml
|
|
compute_template: compute_tpl.yaml
|
|
|
|
run:
|
|
timeout: 36000
|
|
prepare: python wait_cluster.py 3 600
|
|
script: python workloads/horovod_tune_test.py
|
|
long_running: True
|
|
|
|
smoke_test:
|
|
run:
|
|
timeout: 1800
|