mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

Please review **e2e.py and test_suite belonging to your team**! This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit# This PR adds a team name to each test suite. If the name is not specified, it will be reported as unspecified. If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future). Note that we will aggregate all of test config into a single file, nightly_test.yaml.
90 lines
1.9 KiB
YAML
90 lines
1.9 KiB
YAML
- name: bookkeeping_overhead
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_1x16.yaml
|
|
|
|
run:
|
|
timeout: 1200
|
|
script: python workloads/test_bookkeeping_overhead.py
|
|
|
|
|
|
- name: durable_trainable
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_16x2.yaml
|
|
|
|
run:
|
|
timeout: 900
|
|
prepare: python wait_cluster.py 16 600
|
|
script: python workloads/test_durable_trainable.py --bucket data-test-ilr
|
|
|
|
- name: long_running_large_checkpoints
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_1x32_hd.yaml
|
|
|
|
run:
|
|
timeout: 86400
|
|
script: python workloads/test_long_running_large_checkpoints.py
|
|
long_running: True
|
|
|
|
smoke_test:
|
|
run:
|
|
timeout: 3600
|
|
|
|
|
|
- name: network_overhead
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_100x2.yaml
|
|
|
|
run:
|
|
timeout: 900
|
|
prepare_timeout: 1200
|
|
prepare: python wait_cluster.py 100 1200
|
|
script: python workloads/test_network_overhead.py
|
|
|
|
smoke_test:
|
|
cluster:
|
|
compute_template: tpl_20x2.yaml
|
|
|
|
run:
|
|
timeout: 400
|
|
prepare_timeout: 600
|
|
prepare: python wait_cluster.py 20 600
|
|
|
|
- name: result_throughput_cluster
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_16x64.yaml
|
|
|
|
run:
|
|
timeout: 600
|
|
prepare: python wait_cluster.py 16 600
|
|
script: python workloads/test_result_throughput_cluster.py
|
|
|
|
- name: result_throughput_single_node
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config.yaml
|
|
compute_template: tpl_1x96.yaml
|
|
|
|
run:
|
|
timeout: 600
|
|
script: python workloads/test_result_throughput_single_node.py
|
|
|
|
- name: xgboost_sweep
|
|
team: ml
|
|
cluster:
|
|
app_config: app_config_data.yaml
|
|
compute_template: tpl_16x64.yaml
|
|
|
|
run:
|
|
timeout: 3600
|
|
prepare: python wait_cluster.py 16 600
|
|
script: python workloads/test_xgboost_sweep.py
|