Update release process doc and checklist (#18336)

Co-authored-by: Qing Wang <kingchin1218@126.com>
This commit is contained in:
Kai Fricke 2021-09-06 14:09:31 +01:00 committed by GitHub
parent e3e6ed7aaa
commit d9552e6795
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 197 additions and 193 deletions

View file

@ -16,97 +16,55 @@ This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst d
- [ ] Call for release notes made in Slack
## Release Testing
- [ ] Microbenchmark
- [ ] Test passing
- [ ] Results added to `release/release_logs`
- [ ] Long Running Tests (mark complete when run 24 hrs no issues)
- [ ] actor_deaths
- [ ] apex
- [ ] impala
- [ ] many_actor_tasks
- [ ] many_drivers
- [ ] many_ppo
- [ ] many_tasks_serialized_ids
- [ ] many_tasks
- [ ] node_failures
- [ ] pbt
- [ ] serve_failure
- [ ] serve
- [ ] Long Running Distributed Tests
- [ ] pytorch_pbt_failure
- [ ] horovod_test
- [ ] Stress Tests
- [ ] test_dead_actors
- [ ] succeeds
- [ ] Results added to `release/release_logs`
- [ ] test_many_tasks
- [ ] succeeds
- [ ] Results added to `release/release_logs`
- [ ] test_placement_group
- [ ] succeeds
- [ ] Results added to `release/release_logs`
- [ ] RLlib Tests
- [ ] regression_tests
- [ ] compact-regression-tests-tf
- [ ] succeeds
- [ ] Results added to `release/release_logs`
- [ ] compact-regression-tests-torch
- [ ] succeeds
- [ ] Results added to `release/release_logs`
- [ ] stress_tests
- [ ] unit_gpu_tests
- [ ] Scalability Envelope Tests
- [ ] ASAN Test
- [ ] K8s Test
- [ ] K8s operator and helm tests
- [ ] Data processing tests
- [ ] streaming_shuffle
- [ ] dask on ray test
- [ ] Tune tests
- [ ] test_bookkeeping_overhead
- [ ] test_result_throughput_cluster
- [ ] test_result_throughput_single_node
- [ ] test_network_overhead
- [ ] test_long_running_large_checkpoints
- [ ] test_xgboost_sweep
- [ ] test_durable_trainable
- [ ] XGBoost Tests
- [ ] distributed_api_test
- [ ] train_small
- [ ] train_moderate
- [ ] train_gpu
- [ ] tune_small
- [ ] tune_4x32
- [ ] tune_32x4
- [ ] ft_small_non_elastic
- [ ] ft_small_elastic
- [ ] ``core-nightly`` release test suite
- [ ] Test passing
- [ ] Results added to `release/release_logs`
- [ ] many_actors
- [ ] many_nodes
- [ ] many_pgs
- [ ] object_store
- [ ] single_node
- [ ] stress_test_dead_actors
- [ ] stress_test_many_tasks
- [ ] stress_test_placement_group
- [ ] ``nightly`` release test suite
- [ ] Test passing
- [ ] Results added to `release/release_logs`
- [ ] microbenchmark
- [ ] ``weekly`` release test suite
- [ ] Test passing
## Final Steps
- [ ] Check version strings once more
- [ ] Anyscale Docker images built and deployed
- [ ] ML Docker Image Updated
- [ ] Wheels uploaded to Test PyPI
- [ ] Wheel sanity checks with Test PyPI
- [ ] Windows
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] OSX
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] Linux
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] Windows
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] Python 3.9
- [ ] OSX
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] Python 3.9
- [ ] Linux
- [ ] Python 3.6
- [ ] Python 3.7
- [ ] Python 3.8
- [ ] Python 3.9
- [ ] Release is created on Github with release notes
- [ ] Release includes contributors
- [ ] Release notes sent for review to team leads
- [ ] Release is published
- [ ] Release includes contributors
- [ ] Release notes sent for review to team leads
- [ ] Release is published
- [ ] Wheels uploaded to production PyPI
- [ ] Installing latest with `pip install -U ray` reveals correct version number and commit hash
- [ ] “Latest” docs point to new release version
- [ ] Installing latest with `pip install -U ray` reveals correct version number and commit hash
- [ ] "Latest" docs point to new release version
- [ ] Docker image latest is updated on dockerhub
- [ ] PR to bump master version is merged
- [ ] Java release is published on Maven
- [ ] Release is announced internally
- [ ] Release is announced externally
- [ ] Any code/doc changes made during the release process contributed back to master branch

View file

@ -7,6 +7,10 @@ to be used alongside this process document. Also, please keep the
team up-to-date on any major regressions or changes to the timeline
via emails to the engineering@anyscale.com Google Group.
Release tests are run automatically and periodically
(weekly, nightly, and 4x daily) for the latest master branch.
A branch cut should only happen when all tests are passing.
Before Branch Cut
-----------------
1. **Create a document to track release-blocking commits.** These may be pull
@ -20,13 +24,33 @@ Before Branch Cut
Make sure to share this document with major contributors who may have release blockers.
2. **Announce the release** over email to the engineering@anyscale.com mailing
2. **Announce the release** over email to the engineering@anyscale.com mailing
group. The announcement should
contain at least the following information: the release version,
the date when branch-cut will occur, the date the release is expected
to go out (generally a week or so after branch cut depending on how
testing goes), and a link to the document for tracking release blockers.
3. **Considers asking for an internal point of contact for Ant contributions**.
Ant financial are external contributors to the Ray project. If you're not
familiar with their contributions to Ray core, consider asking someone from
the core team to be responsible for communication and deciding which PRs
and features can be release blocking.
Branch Cut
----------
1. **Observe the status of the periodic release tests** (weekly, nightly, and 4x daily).
These are tracked on `buildkite <https://buildkite.com/ray-project/periodic-ci>`__.
The branch cut should only occur when all of these are passing.
Since the weekly tests are run on Sunday, you should usually cut the branch
on the following Monday, so the chance that new regressions are introduced
in new commits is low. Alternatively you can kick off the weekly tests
on master manually during the week and cut on the next day.
2. **Ping test owners for failing tests**. If some of the tests are not passing,
this should be fixed on master as soon as possible. Ping the respective
test owners/teams, or ask them if a failing test is acceptable for the release.
After Branch Cut
----------------
1. **Create a release branch:** Create the branch from the desired commit on master
@ -40,6 +64,11 @@ After Branch Cut
``python/ray/__init__.py``, ``build-docker.sh``, ``src/ray/raylet/main.cc``, and any other files that use ``ray::stats::VersionKey``. See this
`sample commit for bumping the release branch version`_.
Please also update the Java version strings. These are usually found in
the ``pom.xml`` or ``pom_template.xml`` files. You can search for ``2.0.0-SNAPSHOT``
to find code occurrences.
See this `commit for updating the Java version`_.
3. **Create a document to collect release-notes:** You can clone `this document <https://docs.google.com/document/d/1vzcNHulHCrq1PrXWkGBwwtOK53vY2-Ol8SXbnvKPw1s/edit?usp=sharing>`_.
You will also need to create a spreadsheet with information about the PRs
@ -63,128 +92,85 @@ Release Testing
Before each release, we run the following tests to make sure that there are
no major functionality OR performance regressions. You should start running
these tests right after branch cut in order to identify any regressions early.
The `Releaser`_ tool is used to run release tests in the Anyscale product, and
is generally the easiest way to run release tests.
Release tests are run on the Anyscale product using our automatic release
test tool on `Buildkite <https://buildkite.com/ray-project/periodic-ci>`__.
1. **Microbenchmark**
Release tests are added and maintained by the respective teams.
This is a simple test of Ray functionality and performance
across several dimensions. You can run it locally with the release commit
installed using the command ``ray microbenchmark`` for a quick sanity check.
1. **Kick off the tests on the release branch**. Even if all tests passed
on master, we still want to make sure they also pass on the release branch.
However, for the official results, you will need to run the
microbenchmark in the same setting as previous runs--on an `m4.16xl` instance running `Ubuntu 18.04` with `Python 3`
You can do this using the `Releaser`_ tool mentioned above, or
manually by running ``ray up ray/release/microbenchmark/cluster.yaml``
followed by ``ray exec ray/release/microbenchmark/cluster.yaml 'ray microbenchmark'``
a. Navigate to the `buildkite periodic CI pipeline <https://buildkite.com/ray-project/periodic-ci>`__
and click on "New build". Type in a message which should usually include
the release version (e.g. "Release 1.X.0: Weekly tests"). The rest of the
fields can stay as is (``Commit = HEAD`` and ``Branch = master``).
The results should be checked in under ``release_logs/<version>/microbenchmark.txt``.
b. Wait a couple of seconds (usually less than two minutes) until Buildkite
asks you for input ("Input required: Specify tests to run")
You can also get the performance change rate from the previous version using
``util/microbenchmark_analysis.py``.
c. Click the button and enter the required information:
2. **Long-running tests**
- Specify the **branch** (second field): ``releases/1.x.0``
- Specify the **version** (third field): ``1.x.0``
- Select one of the release test suites (core-nightly, nightly, or weekly)
These tests should run for at least 24 hours without erroring or hanging (ensure that it is printing new iterations and CPU load is
stable in the AWS console or in the Anyscale Product's Grafana integration).
d. Hit on "Continue". The tests will now be run.
e. Repeat this process for the other two test suites. You should have kicked
off three builds for the release branch (core-nightly, nightly, weekly).
2. **Track the progress**. You can keep a look at the tests to track their status.
If a test fails, take a look at the output. Sometimes failures can be due
to the product or AWS instance availabilities. If a failure occurs in the test,
check if the same test failed on master and ping the test owner/team.
Please note that some of the ``weekly`` tests run for 24 hours.
3. **Collect release test logs**. For some tests we collect performance metrics
and commit them to the Ray repo at ``release/release_logs/<version>``. These
are currently:
- Microbenchmark results. This is part of the ``nightly`` release test suite.
- Benchmark results (``many_actors``, ``many_nodes``, ``many_pgs``)
These are part of the ``core-nightly`` release test suite.
Please note that some of the names duplicate those from the long running tests.
The tests are not the same - make sure you collect the right results.
- Scalability results (``object_store``, ``single_node``)
These are part of the ``core-nightly`` release test suite.
- Stress tests (``dead_actors``, ``many_tasks``, ``placement_group``).
These are part of the ``core-nightly`` release test suite.
When you take a look at the test output, you'll find that the logs have been
saved to S3. If you're logged in in AWS (as the ``anyscale-dev-ossci`` user), you
can download the results e.g. like this:
.. code-block:: bash
long_running_tests/README.rst
aws s3 cp s3://ray-release-automation-results/dev/microbenchmark_1630573490/microbenchmark/output.log microbenchmark.txt
Follow the instructions to kick off the tests and check the status of the workloads.
Clean up the output logfile (e.g. remove TQDM progress bars) before committing the
release test results.
3. **Long-running multi-node tests**
The PR should be filed for the Ray ``master`` branch, not the release branch.
.. code-block:: bash
4. **For performance tests, check with the teams if the results are acceptable**.
When a test passes on buildkite it just means that it ran to completion. Some
tests, especially benchmarks, can pass but still show performance regressions.
For these tests (usually the same we collect logs for), check with the respective
teams if the test performance is acceptable.
long_running_distributed_tests/README.rst
5. **Repeating release tests**. If one or more tests failed and you need to run
them again, follow the instructions from the first bullet point. Instead of
running the full suite, you can use the last two fields to filter the name
of the test file or the name of the test itself. This is a simple ``if filter in name``
filter, and only matching tests will be included in the run.
Follow the instructions to kick off the tests and check the status of the workloads.
These suite of tests are similar to the standard long running tests, except these actually run in a multi-node cluster instead of just a simulated one.
These tests should also run for at least 24 hours without erroring or hanging.
**IMPORTANT**: check that the test are actually running (printing output regularly) and aren't
just stuck at an iteration. You must also check that the node CPU usage is stable
(and not increasing or decreasing over time, which indicates a leak). You can see the head node
and worker node CPU utilizations in the AWS console.
4. **Multi-node regression tests**
Follow the same instruction as long running stress tests. The large scale distributed
regression tests identify potential performance regression in distributed environment.
The following test should be run, and can be run with the `Releaser`_ tool
like other tests:
- ``rllib_tests/regression_tests`` run the compact regression test for rllib.
- ``rllib_tests/stress_tests`` run multinode 8hr IMPALA trial.
- ``stress_tests`` contains two tests: ``many_tasks`` and ``dead_actors``.
Each of the test runs on 105 spot instances.
- ``stress_tests/workloads/placement_group`` contains a Python script to run tests.
It currently uses ``cluster_util`` to emulate the cluster testing. It will be converted to
real multi-node tests in the future. For now, just make sure the test succeed locally.
Make sure that these pass. For the RLlib regression tests, there shouldn't be any errors
and the rewards should be similar to previous releases. For the rest, it will be obvious if
they passed, as they will output metrics about their execution times and results that can be compared to previous releases.
**IMPORTANT**: You must get signoff from the RLlib team for the RLlib test results.
The summaries printed by each test should be checked in under
``release_logs/<version>`` on the **master** branch (make a pull request).
5. **Scalability envelope tests**
- Run the tests in `benchmarks/` (with `ray submit --start cluster.yaml <test file>`)
- Record the outputted times.
- Whether the results are acceptable is a judgement call.
6. **ASAN tests**
Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the whole Python tests to detect memory leaks.
7. **K8s operator tests**
Refer to ``kubernetes_tests/README.md``. These tests verify basic functionality of the Ray Operator and Helm chart.
8. **Data processing tests**
.. code-block:: bash
data_processing_tests/README.rst
Follow the instructions to kick off the tests and check the status of the workloads.
Data processing tests make sure all the data processing features are reliable and performant.
The following tests should be run.
- ``data_processing_tests/workloads/streaming_shuffle.py`` run the 100GB streaming shuffle in a single node & fake 4 nodes cluster.
- ``data_processing_tests/workloads/dask_on_ray_large_scale_test.py`` runs the large scale dask on ray test in 250 nodes cluster.
**IMPORTANT** Check if the workload scripts has terminated. If so, please record the result (both read/write bandwidth and the shuffle result) to the ``release_logs/data_processing_tests/[test_name]``.
Both shuffling runtime and read/write bandwidth shouldn't be decreasing more than 15% compared to the previous release. For the dask on ray test, just make sure it runs for at least 30 minutes without the driver crash.
9. **Ray Tune release tests**
General Ray Tune functionality is implicitly tested via RLLib and XGBoost release tests.
We are in the process of introducing scalability envelopes for Ray Tune.
Release tests are expected to run through without errors and to pass within a pre-specified time.
The time is checked in the test function and the output will let you know if a run was fast enough and
thus passed the test.
10. **XGBoost release tests**
.. code-block:: bash
xgboost_tests/README.rst
Follow the instructions to kick off the tests and check the status of the workloads.
The XGBoost release tests use assertions or fail with exceptions and thus
should automatically tell you if they failed or not.
Only in the case of the fault tolerance tests you might want
to check the logs. See the readme for more information.
For instance, if you just want to kick off the ``benchmark_tests/many_actors``
test, you could specify ``benchmark`` in the test file filter and ``actors``
in the test name filter.
As another example, if you just want to kick off all nightly RLLib tests,
select the respective test suite and specify ``rllib`` in the test file filter.
Identify and Resolve Release Blockers
-------------------------------------
@ -215,6 +201,15 @@ to proceed with the final stages of the release!
or is delayed, the rest of the release process is blocked until the
issues have been resolved.
To have the product Docker images built, ping the product team on the
Slack channel (#product) and ask them to build the image. Provide
the latest commit hash and make sure all wheels (Linux, Mac, Windows
for Python 3.6, 3.7, 3.8, 3.9) are available on S3:
.. code-block::
aws s3 ls s3://ray-wheels/releases/1.x.0/<hash>/
2. **Create a GitHub release:** Create a `GitHub release`_. This should include
**release notes**. Copy the style and formatting used by previous releases.
Create a draft of the release notes containing information about substantial
@ -251,6 +246,10 @@ to proceed with the final stages of the release!
This can be tested if you use the script source ./bin/download_wheels.sh
Tip: Because downloading the wheels can take a long time, you should
consider starting an AWS instance just for this. The download will take
seconds rather than minutes or hours (and even more so the following upload).
4. **Upload to PyPI Test:** Upload the wheels to the PyPI test site using
``twine``.
@ -279,7 +278,7 @@ to proceed with the final stages of the release!
This process is automated. Run ./bin/pip_download_test.sh.
This will download the ray from the test pypi repository and run the minimum
sanity check from all the Python version supported. (3.6, 3.7, 3.8)
sanity check from all the Python version supported. (3.6, 3.7, 3.8, 3.9)
The Windows sanity check test is currently not automated.
You can start a Windows
@ -328,11 +327,57 @@ to proceed with the final stages of the release!
Check the dockerhub to verify the update worked. https://hub.docker.com/repository/docker/rayproject/ray/tags?page=1&name=latest&ordering=last_updated
10. **Send out an email announcing the release** to the employees@anyscale.com
Google group, and post a slack message in the Announcements channel of the
Ray slack (message a team lead if you do not have permissions.)
10. **Release the Java packages to Maven**.
11. **Improve the release process:** Find some way to improve the release
As a prerequisite, you'll need GPG installed and configured.
`You can download GPG here <https://gpgtools.org/>`_. After setting up
your key, make sure to publish it to a server so users can validate it.
You'll also need java 8 and maven set up. On MacOS e.g. via:
.. code-block:: bash
brew install openjdk@8
brew install maven
Make sure that the Java version strings in the release branch
have been updated to the current version.
You'll need to obtain the Maven credentials. These can be found in the
shared Anyscale 1password (search for "Maven").
Also look up the latest commit hash for the release branch. Then, run
the following script to generate the multiplatform jars and publish
them on Maven:
.. code-block:: bash
# Make sure you are under the Ray root source directory.
export RELEASE_VERSION=1.x.0 # Set the release version
export OSSRH_KEY=xxx # Maven username
export OSSRH_TOKEN=xxx # Maven password
export TRAVIS_BRANCH=releases/${RELEASE_VERSION}
export TRAVIS_COMMIT=xxxxxxxxxxx # The commit hash
git checkout $TRAVIS_COMMIT
sh java/build-jar-multiplatform.sh multiplatform
export GPG_SKIP=false
cd java && mvn versions:set -DnewVersion=${RELEASE_VERSION} && cd -
cd streaming/java && mvn versions:set -DnewVersion=${RELEASE_VERSION} && cd -
sh java/build-jar-multiplatform.sh deploy_jars
After that, `log into Sonatype <https://oss.sonatype.org/>`_ and log in
using the same Maven credentials. Click on "Staging repositories", select
the respective staging repository, click on "Close" and after that has
been processed, click on "Release". This will publish the release
onto the main Maven repository.
You can check the releases on `mvnrepository.com <https://mvnrepository.com/artifact/io.ray/ray-api>`_.
11. **Send out an email announcing the release** to the employees@anyscale.com
Google group, and post a slack message in the Announcements channel of the
Ray slack (message a team lead if you do not have permissions.)
12. **Improve the release process:** Find some way to improve the release
process so that whoever manages the release next will have an easier time.
If you had to make any changes to tests or cluster configurations, make
sure they are contributed back! If you've noticed anything in the docs that
@ -356,6 +401,7 @@ The AWS s3 file hierarchy for Ray wheels can be found `here <https://s3.console.
in case you're having trouble with the above link.
.. _`sample commit for bumping the release branch version`: https://github.com/ray-project/ray/commit/c589de6bc888eb26c87647f5560d6b0b21fbe537
.. _`commit for updating the Java version`: https://github.com/ray-project/ray/pull/15394/files
.. _`GitHub release`: https://github.com/ray-project/ray/releases
.. _`Ray Readthedocs version page`: https://readthedocs.org/projects/ray/versions/
.. _`Ray Readthedocs advanced settings page`: https://readthedocs.org/dashboard/ray/advanced/