ray/release/ray_release
Kai Fricke e8abffb017
[tune/release] Improve Tune cloud release tests for durable storage (#23277)
This PR addresses recent failures in the tune cloud tests.

In particular, this PR changes the following:

    The trial runner will now wait for potential previous syncs to finish before syncing once more if force=True is supplied. This is to make sure that the final experiment checkpoints exist in the most recent version on remote storage. This likely fixes some flakiness in the tests.
    We switched to new cloud buckets that don't interfere with other tests (and are less likely to be garbage collected)
    We're now using dated subdirectories in the cloud buckets so that we don't interfere if two tests are run in parallel. Objects are cleaned up afterwards. The buckets are configured to remove objects after 30 days.
    Lastly, we fix an issue in the cloud tests where the RELEASE_TEST_OUTPUT file was unavailable when run in Ray client mode (as e.g. in kubernetes).

Local release test runs succeeded.

https://buildkite.com/ray-project/release-tests-branch/builds/189
https://buildkite.com/ray-project/release-tests-branch/builds/191
2022-03-30 09:28:33 -07:00
..
alerts [tune] Adjust release test timeouts (#23362) 2022-03-20 17:05:20 +00:00
buildkite [ci/release] Unstable tests should only soft fail the build (#23403) 2022-03-23 09:38:56 +00:00
cluster_manager [tune/release] Improve Tune cloud release tests for durable storage (#23277) 2022-03-30 09:28:33 -07:00
command_runner [ci/release] Reload modules after installing matching Ray (#23227) 2022-03-16 15:44:43 +00:00
file_manager [Nightly test] Fix job download retry (#23401) 2022-03-22 08:31:24 -07:00
reporter [ci/release] Legacy field should be optional (#23326) 2022-03-18 11:34:05 +00:00
scripts [ci/release] Save test config and results as artifacts (#23278) 2022-03-18 09:26:42 +00:00
tests [ci/release] Retry cluster env build on failure (#23378) 2022-03-22 09:45:22 +00:00
__init__.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
anyscale_util.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
aws.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
config.py [ci/release] Re-enable commit sanity check (#23327) 2022-03-18 12:57:41 +00:00
exception.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
glue.py [Nightly test] Support Job based file manager + runner (#22860) 2022-03-10 15:03:50 -08:00
job_manager.py [Nightly tests] Improve k8s testing (#23108) 2022-03-14 03:49:15 -07:00
logger.py [tune/release] Improve Tune cloud release tests for durable storage (#23277) 2022-03-30 09:28:33 -07:00
result.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
schema.json [ci/release] Legacy field should be optional (#23326) 2022-03-18 11:34:05 +00:00
util.py [ci/release] Support PR wheels (#23084) 2022-03-14 17:24:13 +00:00
wheels.py [ci/release] Reload modules after installing matching Ray (#23227) 2022-03-16 15:44:43 +00:00