ray/release/xgboost_tests
Kai Fricke 331b71ea8d
[ci/release] Refactor release test e2e into package (#22351)
Adds a unit-tested and restructured ray_release package for running release tests.

Relevant changes in behavior:

Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior).

The main subpackages are:

    Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster
    Command runner: Runs commands, e.g. as client command or sdk command
    File manager: Uploads/downloads files to/from session
    Reporter: Reports results (e.g. to database)

Much of the code base is unit tested, but there are probably some pieces missing.

Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_
Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023
2022-02-16 17:35:02 +00:00
..
workloads [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
app_config.yaml [release/xgboost] xgboost release test fixes via app config (#20325) 2021-11-15 10:03:21 -08:00
app_config_gpu.yaml [release/xgboost] xgboost release test fixes via app config (#20325) 2021-11-15 10:03:21 -08:00
create_test_data.py [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
README.rst [xgboost] Update XGBoost release test configs (#13941) 2021-02-17 23:00:49 +01:00
tpl_cpu_moderate.yaml [release] Move xgboost tune small + microbenchmark release test to new release automation (#15619) 2021-05-08 20:38:39 +01:00
tpl_cpu_small.yaml [release] Move xgboost tune small + microbenchmark release test to new release automation (#15619) 2021-05-08 20:38:39 +01:00
tpl_gpu_small.yaml [release] Move xgboost tune small + microbenchmark release test to new release automation (#15619) 2021-05-08 20:38:39 +01:00
wait_cluster.py [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
xgboost_tests.yaml [Test Infra] Unrevert team col (#21700) 2022-01-19 13:29:53 -08:00

XGBoost on Ray tests
====================

This directory contains various XGBoost on Ray release tests.

You should run these tests with the `releaser <https://github.com/ray-project/releaser>`_ tool.

Overview
--------
There are four kinds of tests:

1. ``distributed_api_test`` - checks general API functionality and should finish very quickly (< 1 minute)
2. ``train_*`` - checks single trial training on different setups.
3. ``tune_*`` - checks multi trial training via Ray Tune.
4. ``ft_*`` - checks fault tolerance.

Generally the releaser tool will run all tests in parallel, but if you do
it sequentially, be sure to do it in the order above. If ``train_*`` fails,
``tune_*`` will fail, too.

Acceptance criteria
-------------------
These tests are considered passing when they throw no error at the end of
the output log.