1
0
Fork 0
mirror of https://github.com/vale981/ray synced 2025-03-15 23:56:40 -04:00
Commit graph

7 commits

Author SHA1 Message Date
Kai Fricke
7091a32fe1
[ci/release] Support running tests on staging ()
This adds "environments" to the release package that can be used to configure some environment variables. These variables will be loaded either by an `--env` argument or a `env` definition in the test definition and can be used to e.g. run release tests on staging.
2022-06-28 10:14:01 -07:00
Kai Fricke
2cf20e5406
[ci/release] Use 1.12.1 as base image in app configs ()
Many release tests are currently failing for cuda version incompatibilities. Pinning the base image to 1.12.1 seems to resolve the problem for the time being.
2022-05-26 18:58:20 +02:00
Kai Fricke
40a8183e05
[ci/release] Fix job-based file download ()
have to wrap download call in a lambda to be compatible with run_with_retry
2022-04-04 08:06:31 -07:00
Kai Fricke
fe27dbcd9a
[air/release] Improve file packing/unpacking ()
We use tarfile to pack/unpack directories in several locations. Instead of using temporary files, we can just use io.BytesIO to avoid unnecessary disk writes.

Note that this functionality is present in 3 different modules - in Ray (AIR), in the release test package, and in a specific release test. The implementations should live in the three modules independently, so we don't add a common utility for this (e.g. the ray_release package should be independent of the Ray package).
2022-04-01 07:38:14 -07:00
SangBin Cho
0cd687cc19
[Nightly test] Fix job download retry ()
Currently when we download a file to the cluster using a job, we don't do the retry.
2022-03-22 08:31:24 -07:00
SangBin Cho
ebac18d163
[Nightly test] Support Job based file manager + runner ()
This PR supports the job-based file manager and runner. It will be the backbone of k8s migration.

The PR handles edge cases that originally existed in the old e2e.py job-based runners.
2022-03-10 15:03:50 -08:00
Kai Fricke
331b71ea8d
[ci/release] Refactor release test e2e into package ()
Adds a unit-tested and restructured ray_release package for running release tests.

Relevant changes in behavior:

Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior).

The main subpackages are:

    Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster
    Command runner: Runs commands, e.g. as client command or sdk command
    File manager: Uploads/downloads files to/from session
    Reporter: Reports results (e.g. to database)

Much of the code base is unit tested, but there are probably some pieces missing.

Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_
Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023
2022-02-16 17:35:02 +00:00