ray/release/ray_release/alerts/long_running_tests.py
Kai Fricke 331b71ea8d
[ci/release] Refactor release test e2e into package (#22351)
Adds a unit-tested and restructured ray_release package for running release tests.

Relevant changes in behavior:

Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior).

The main subpackages are:

    Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster
    Command runner: Runs commands, e.g. as client command or sdk command
    File manager: Uploads/downloads files to/from session
    Reporter: Reports results (e.g. to database)

Much of the code base is unit tested, but there are probably some pieces missing.

Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_
Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023
2022-02-16 17:35:02 +00:00

43 lines
1.1 KiB
Python

from typing import Optional
from ray_release.config import Test
from ray_release.result import Result
def handle_result(
test: Test,
result: Result,
) -> Optional[str]:
last_update_diff = result.results.get("last_update_diff", float("inf"))
test_name = test["legacy"]["test_name"]
if test_name in [
"actor_deaths",
"many_actor_tasks",
"many_drivers",
"many_tasks",
"many_tasks_serialized_ids",
"node_failures",
"object_spilling_shuffle",
]:
# Core tests
target_update_diff = 300
elif test_name in ["apex", "impala", "many_ppo", "pbt"]:
# Tune/RLLib style tests
target_update_diff = 480
elif test_name in ["serve", "serve_failure"]:
# Serve tests have workload logs every five minutes.
# Leave up to 180 seconds overhead.
target_update_diff = 480
else:
return None
if last_update_diff > target_update_diff:
return (
f"Last update to results json was too long ago "
f"({last_update_diff:.2f} > {target_update_diff})"
)
return None