ray/release
Kai Fricke f376dd8902
[tune] Also interrupt training when SIGUSR1 received (#24015)
Ray Tune currently gracefully stops training on SIGINT. However, the Ray core worker prevents SIGINT (and SIGTERM) to be processed by child tasks, which means that Ray Tune runs that are started in remote tasks (e.g. via Ray client) cannot be gracefully interrupted.

In k8s-based cloud tests that used the Ray client to kick off a Ray Tune run, this lead to test flakiness, as final experiment state could not be gracefully persisted to cloud storage.

This PR adds support for SIGUSR1 in addition to SIGINT to interrupt training gracefully.
2022-04-21 13:07:29 +01:00
..
benchmarks [Test]Add a time check for task benchmark (#23170) 2022-04-11 06:27:04 -07:00
golden_notebook_tests [Serve] Fix torch_tune_serve_test client test (#24031) 2022-04-20 16:52:27 -07:00
horovod_tests [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
jobs_tests Add basic jobs release test with Tune script (#23474) 2022-04-05 13:31:11 -05:00
kubernetes_manual_tests [minor] Fix legacy OSS operator test (#23540) 2022-04-18 17:47:42 -07:00
lightgbm_tests [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
long_running_distributed_tests [RLlib] Pin Gym Everywhere and turn off gpu for recsim tests (#23452) 2022-03-24 09:17:30 +01:00
long_running_tests [release tests] Pin gym everywhere (#23349) 2022-03-19 02:52:54 -07:00
microbenchmark [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
ml_user_tests [Release] Upgrade instance types for xgboost gpu release tests (#24002) 2022-04-20 15:18:22 -07:00
nightly_tests [Core][nightly-test] fix shuffle 5000 partition OOM #23997 2022-04-18 23:49:51 -07:00
ray_release [ci/release] Allow for preferring smoke tests when filtering (#23887) 2022-04-14 06:12:27 +01:00
release_logs [Release 1.12.0] Add release logs for 1.12.0rc1 (#23508) 2022-04-07 11:23:04 -07:00
rllib_tests [ci] Clean up ci/ directory (refactor ci/travis) (#23866) 2022-04-13 18:11:30 +01:00
runtime_env_tests [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
serve_tests [serve] Add component logger + basic access logging (#23558) 2022-04-12 18:16:58 -05:00
sgd_tests/sgd_gpu [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
train_tests/horovod [Train] Fix multi node horovod bug (#22564) 2022-03-22 16:22:53 -07:00
tune_tests [tune] Also interrupt training when SIGUSR1 received (#24015) 2022-04-21 13:07:29 +01:00
util [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
xgboost_tests [Release] Upgrade instance types for xgboost gpu release tests (#24002) 2022-04-20 15:18:22 -07:00
__init__.py [release] move release testing end to end script to main ray repo (#17070) 2021-07-14 12:39:07 -07:00
BUILD [Serve] Fix torch_tune_serve_test client test (#24031) 2022-04-20 16:52:27 -07:00
README.md [Release] Remove release process doc (#19312) 2021-10-18 11:24:03 -07:00
release_tests.yaml [core][tests] Add nightly test for datasets random_shuffle and sort (#23807) 2022-04-12 12:53:57 -07:00
requirements.txt [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
requirements_buildkite.txt [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
run_release_test.sh [ci/release] Disable infra retries for now (#23132) 2022-03-14 11:51:11 +00:00
setup.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00

Release Tests

While the exact release process relies on Anyscale internal tooling, the tests we run during the releases are located at https://github.com/ray-project/ray/blob/master/release/.buildkite/build_pipeline.py