[Release] Update Release Process Documentation (#13123)

2025-03-06 02:21:39 -05:00 · 2021-01-04 11:09:43 -08:00 · 2021-01-04 11:09:43 -08:00 · d018212db5
commit d018212db5
parent d632b0f0f7
3 changed files with 274 additions and 57 deletions
--- a/release/RELEASE_CHECKLIST.md
+++ b/release/RELEASE_CHECKLIST.md
@ -0,0 +1,86 @@
+# Release Checklist
+This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst document.
+
+## Initial Steps
+- [ ] Called for release blockers
+	- [ ] Messaged Ant about release blockers
+- [ ] Announced branch cut date and estimated release date
+
+## Branch Cut
+- [ ] Release branch created
+- [ ] PR created to update “latest” version on master (do not merge yet)
+- [ ] Release branch versions updated
+	- [ ] Version keys have new version
+	- [ ] Update of “Latest” commits cherry-picked into release branch
+- [ ] Release commits pulled into spreadsheet
+- [ ] Release notes doc created
+- [ ] Call for release notes made in Slack
+
+## Release Testing
+- [ ] Microbenchmark
+	- [ ] Test passing
+	- [ ] Results added to `release/release_logs`
+- [ ] Long Running Tests (mark complete when run 24 hrs no issues)
+	- [ ] actor_deaths
+	- [ ] apex
+	- [ ] impala
+	- [ ] many_actor_tasks
+	- [ ] many_drivers
+	- [ ] many_ppo
+	- [ ] many_tasks_serialized_ids
+	- [ ] many_tasks
+	- [ ] node_failures
+	- [ ] pbt
+	- [ ] serve_failure
+	- [ ] serve
+- [ ] Long Running Distributed Tests
+	- [ ] pytorch_pbt_failure
+- [ ] horovod_test
+- [ ] Stress Tests
+	- [ ] test_dead_actors
+		- [ ] succeeds
+		- [ ] Results added to `release/release_logs`
+	- [ ] test_many_tasks
+		- [ ] succeeds
+		- [ ] Results added to `release/release_logs`
+	- [ ] test_placement_group
+		- [ ] succeeds
+		- [ ] Results added to `release/release_logs`
+- [ ] RLlib Tests
+	- [ ] regression_tests
+		- [ ] compact-regression-tests-tf
+			- [ ] 	succeeds
+			- [ ] Results added to `release/release_logs`
+		- [ ] compact-regression-tests-torch
+			- [ ] 	succeeds
+			- [ ] Results added to `release/release_logs`
+	- [ ] stress_tests
+	- [ ] unit_gpu_tests
+
+## Final Steps
+- [ ] Wheels uploaded to Test PyPI
+- [ ] Wheel sanity checks with Test PyPI
+	- [ ] Windows
+		- [ ] Python 3.6
+		- [ ] Python 3.7
+		- [ ] Python 3.8
+	- [ ] OSX
+		- [ ] Python 3.6
+		- [ ] Python 3.7
+		- [ ] Python 3.8
+	- [ ] Linux
+		- [ ] Python 3.6
+		- [ ] Python 3.7
+		- [ ] Python 3.8
+- [ ] Release is created on Github with release notes
+	- [ ] Release includes contributors
+	- [ ] Release notes sent for review to team leads
+	- [ ] Release is published
+- [ ] Wheels uploaded to production PyPI
+	- [ ] Installing latest with `pip install -U ray` reveals correct version number and commit hash
+- [ ] “Latest” docs point to new release version
+- [ ] Docker image latest is updated on dockerhub
+- [ ] PR to bump master version is merged
+- [ ] Release is announced internally
+- [ ] Release is announced externally
+- [ ] Any code/doc changes made during the release process contributed back to master branch
--- a/release/RELEASE_PROCESS.rst
+++ b/release/RELEASE_PROCESS.rst
@ -1,45 +1,126 @@
 Release Process
 ===============

-This document describes the process for creating new releases.
+The following documents the Ray release process. Please use the
+`Release Checklist`_ to keep track of your progress, as it is meant
+to be used alongside this process document. Also, please keep the
+team up-to-date on any major regressions or changes to the timeline
+via emails to the engineering@anyscale.com Google Group.

+Before Branch Cut
+-----------------
+1. **Create a document to track release-blocking commits.** These may be pull
+   requests that are not ready at the time of branch cut, or they may be
+   fixes for issues that you encounter during release testing later.
+   The only PRs that should be considered release-blocking are those which
+   fix a MAJOR REGRESSION (P0) or deliver an absolutely critical piece of
+   functionality that has been promised for the release (though this should
+   be avoided where possible).
+   You may make a copy of the following `template <https://docs.google.com/spreadsheets/d/1qeOYErAn3BzGgtEilBePjN6tavdbabCEEqglDsjrq1g/edit#gid=0>`_.
+
+   Make sure to share this document with major contributors who may have release blockers.
+
+2. **Announce the release** over email to the engineering@anyscale.com mailing 
+   group. The announcement should
+   contain at least the following information: the release version, 
+   the date when branch-cut will occur, the date the release is expected
+   to go out (generally a week or so after branch cut depending on how
+   testing goes), and a link to the document for tracking release blockers.
+
+After Branch Cut
+----------------
 1. **Create a release branch:** Create the branch from the desired commit on master
   In order to create the branch, locally checkout the commit ID i.e.,
   ``git checkout <hash>``. Then checkout a new branch of the format
-   ``releases/<release-version>``. Then push that branch to the ray repo:
+   ``releases/<release-version>`` (e.g. ``releases/1.3.1``). Then push that branch to the ray repo:
   ``git push upstream releases/<release-version>``.

 2. **Update the release branch version:** Push a commit directly to the
   newly-created release branch that increments the Python package version in
-   python/ray/__init__.py and src/ray/raylet/main.cc. See this
+   ``python/ray/__init__.py``, ``src/ray/raylet/main.cc``, and any other files that use ``ray::stats::VersionKey``. See this
   `sample commit for bumping the release branch version`_.

-3. **Update the master branch version:**
-
-   For a new minor release (e.g., 0.7.0): Create a pull request to
+3. **Create PR to update the master branch version:**
+   For a new minor release (e.g., ``0.6.3 -> 0.7.0``): Create a pull request to
   increment the dev version in of the master branch. See this
   `sample PR for bumping a minor release version`_. **NOTE:** Not all of
   the version numbers should be replaced. For example, ``0.7.0`` appears in
   this file but should not be updated.
+   You will want to replace all instances of nightly wheel links (wheels with
+   dev in their name), instances of ``ray::stats::VersionKey``,
+   and instances of ``__version__``.

-   For a new micro release (e.g., 0.7.1): No action is required.
+   TIP: search the code base for instances of ``https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-1.1.0.dev0`` (but replace ray-1.1.0 with the current version) to find links to the nightly wheel to update.

-4. **Testing:** Before releasing, the following sets of tests should be run.
-   The results of each of these tests for previous releases are checked in
-   under ``release_logs``, and should be compared against to identify
-   any regressions.
+   You should also cherry pick parts of this pull request referencing the
+   latest wheels onto the release branch prior
+   to making the release. This will ensure that the ``install-nightly`` of
+   the new release correctly points at the latest version on master.

-   1. Long-running tests
+   Note: Doing this will cause the ``install-nightly`` CLI command of the
+   current Ray release to fail, so you should only merge this PR when
+   the release is about to occur. (For instance, the 1.0.0 ``install-nightly``
+   command points to 1.1.0.dev0, so when you bump master to 1.2.0.dev0, it no
+   longer finds the latest commit.)
+
+   For a new micro release (e.g., ``0.7.0 -> 0.7.1``): No action is required.
+
+4. **Create a document to collect release-notes:** You can clone `this document <https://docs.google.com/document/d/1vzcNHulHCrq1PrXWkGBwwtOK53vY2-Ol8SXbnvKPw1s/edit?usp=sharing>`_.
+
+   You will also need to create a spreadsheet with information about the PRs 
+   included in the release to jog people's memories. You can collect this
+   information by running
+   .. code-block:: bash
+     git log --date=local --pretty=format:"%h%x09%an%x09%ad%x09%s" releases/1.0.1..releases/1.1.0 > release-commits.tsv
+
+   Then, upload this tsv file to Google sheets
+   and sort by description. 
+
+   Ask team leads to contribute notes for their teams' projects. Include both
+   the spreadsheet and document in your message.
+   (Some people to message are Richard Liaw, Eric Liang, Edward
+   Oakes, Simon Mo, Sven Mika, and Ameer Haj Ali. Please tag these people in the
+   document or @mention them in your release announcement.)
+
+
+Release Testing
+---------------
+Before each release, we run the following tests to make sure that there are
+no major functionality OR performance regressions. You should start running
+these tests right after branch cut in order to identify any regressions early.
+The `Releaser`_ tool is used to run release tests in the Anyscale product, and
+is generally the easiest way to run release tests. 
+
+
+1. **Microbenchmark** 
+
+   This is a simple test of Ray functionality and performance
+   across several dimensions. You can run it locally with the release commit
+   installed using the command ``ray microbenchmark`` for a quick sanity check.
+
+   However, for the official results, you will need to run the 
+   microbenchmark in the same setting as previous runs--on an `m4.16xl` instance running `Ubuntu 18.04` with `Python 3`
+   You can do this using the `Releaser`_ tool mentioned above, or 
+   manually by running ``ray up ray/release/microbenchmark/cluster.yaml``
+   followed by ``ray exec ray/release/microbenchmark/cluster.yaml 'ray microbenchmark'``
+
+   The results should be checked in under ``release_logs/<version>/microbenchmark.txt``.
+
+   You can also get the performance change rate from the previous version using
+   ``util/microbenchmark_analysis.py``.
+
+2. **Long-running tests**
+
+   These tests should run for at least 24 hours without erroring or hanging (ensure that it is printing new iterations and CPU load is
+   stable in the AWS console or in the Anyscale Product's Grafana integration).

   .. code-block:: bash

-       long_running_tests/README.rst
+      long_running_tests/README.rst

   Follow the instructions to kick off the tests and check the status of the workloads.
-   These tests should run for at least 24 hours without erroring or hanging (ensure that it is printing new iterations and CPU load is
-   stable in the AWS console).

-   2. Long-running multi-node tests
+3. **Long-running multi-node tests**

   .. code-block:: bash

@ -54,54 +135,57 @@ This document describes the process for creating new releases.
   (and not increasing or decreasing over time, which indicates a leak). You can see the head node
   and worker node CPU utilizations in the AWS console.

-   3. Multi-node regression tests
+4. **Multi-node regression tests**

   Follow the same instruction as long running stress tests. The large scale distributed
   regression tests identify potential performance regression in distributed environment.
-   The following test should be ran:
+   The following test should be run, and can be run with the `Releaser`_ tool
+   like other tests:

   - ``rllib_tests/regression_tests`` run the compact regression test for rllib.
   - ``rllib_tests/stress_tests`` run multinode 8hr IMPALA trial.
   - ``stress_tests`` contains two tests: ``many_tasks`` and ``dead_actors``.
-     Each of the test runs on 105 spot instances.
+      Each of the test runs on 105 spot instances.
   - ``stress_tests/workloads/placement_group`` contains a Python script to run tests.
-     It currently uses ``cluster_util`` to emulate the cluster testing. It will be converted to 
-     real multi-node tests in the future. For now, just make sure the test succeed locally.
+      It currently uses ``cluster_util`` to emulate the cluster testing. It will be converted to 
+      real multi-node tests in the future. For now, just make sure the test succeed locally.

   Make sure that these pass. For the RLlib regression tests, there shouldn't be any errors
   and the rewards should be similar to previous releases. For the rest, it will be obvious if
-   they passed.  This will use the autoscaler to start a bunch of machines and run some tests.
+   they passed, as they will output metrics about their execution times and results that can be compared to previous releases. 

   **IMPORTANT**: You must get signoff from the RLlib team for the RLlib test results.

   The summaries printed by each test should be checked in under
   ``release_logs/<version>`` on the **master** branch (make a pull request).

-   4. Microbenchmarks
-
-   Run the ``microbenchmark`` with the commit. Under the hood, the session will
-   run `ray microbenchmark` on an `m4.16xl` instance running `Ubuntu 18.04` with `Python 3`
-   to get the latest microbenchmark numbers.
-
-   The results should be checked in under ``release_logs/<version>``.
-
-   You can also get the performance change rate from the previous version using
-   ``util/microbenchmark_analysis.py``.
-
-   5. ASAN tests
+5. **ASAN tests**

   Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the
   whole Python tests to detect memory leaks.

-5. **Resolve release-blockers:** If a release blocking issue arises, there are
-   two ways the issue can be resolved: 1) Fix the issue on the master branch and
-   cherry-pick the relevant commit  (using ``git cherry-pick``) onto the release
-   branch (recommended). 2) Revert the commit that introduced the bug on the
+Identify and Resolve Release Blockers
+-------------------------------------
+If a release blocking issue arises in the course of testing, you should
+reach out to the team to which the issue corresponds. They should either
+work on a fix immediately or tell you which changes ought to be reverted.
+
+There are two ways the issue can be resolved: 
+
+1. Fix the issue on the master branch and
+   cherry-pick the relevant commit (using ``git cherry-pick``) onto the release
+   branch (recommended). 
+2. Revert the commit that introduced the bug on the
   release branch (using ``git revert``), but not on the master (not recommended).

-   These changes should then be pushed directly to the release branch.
+These changes should then be pushed directly to the release branch.

-6. **Create a GitHub release:** Create a `GitHub release`_. This should include
+Once Release Blockers are Resolved
+----------------------------------
+After all release blockers are resolved and testing complete, you are ready
+to proceed with the final stages of the release!
+
+1. **Create a GitHub release:** Create a `GitHub release`_. This should include
   **release notes**. Copy the style and formatting used by previous releases.
   Create a draft of the release notes containing information about substantial
   changes/updates/bugfixes and their PR numbers. Once you have a draft, send it
@ -125,7 +209,7 @@ This document describes the process for creating new releases.
        --prev-release-commit="<COMMIT_SHA>" \
        --curr-release-commit="<COMMIT_SHA>"

-7. **Download all the wheels:** Now the release is ready to begin final
+2. **Download all the wheels:** Now the release is ready to begin final
   testing. The wheels are automatically uploaded to S3, even on the release
   branch. To download them, use ``util/download_wheels.sh``:

@ -137,7 +221,7 @@ This document describes the process for creating new releases.

   This can be tested if you use the script source ./bin/download_wheels.sh

-8. **Upload to PyPI Test:** Upload the wheels to the PyPI test site using
+3. **Upload to PyPI Test:** Upload the wheels to the PyPI test site using
   ``twine``.

   .. code-block:: bash
@ -171,11 +255,14 @@ This document describes the process for creating new releases.
   This will download the ray from the test pypi repository and run the minimum
   sanity check from all the Python version supported. (3.6, 3.7, 3.8)

-   Windows sanity check test is currently not automated.
+   The Windows sanity check test is currently not automated. 
+   You can start a Windows
+   VM in the AWS console running the Deep Learning AMI, then install the correct
+   version of Ray using the Anaconda prompt.

-9. **Upload to PyPI:** Now that you've tested the wheels on the PyPI test
-   repository, they can be uploaded to the main PyPI repository. Be careful,
-   **it will not be possible to modify wheels once you upload them**, so any
+4. **Upload to PyPI:** Now that you've tested the wheels on the PyPI test
+   repository, they can be uploaded to the main PyPI repository. **Be careful,
+   it will not be possible to modify wheels once you upload them**, so any
   mistake will require a new release.

   .. code-block:: bash
@ -191,19 +278,55 @@ This document describes the process for creating new releases.

     pip install -U ray

-10. **Create a point release on readthedocs page:** Go to the `Ray Readthedocs version page`_.
-    Scroll to "Activate a version" and mark the *release branch* as "active" and "public". This creates a point release for the documentation.
-    Message @richardliaw to add you if you don't have access.
+5. **Create a point release on readthedocs page:** Go to the `Ray Readthedocs version page`_.
+   Scroll to "Activate a version" and mark the *release branch* as "active" and "public". This creates a point release for the documentation.
+   Message @richardliaw to add you if you don't have access.

-11. **Update 'Default Branch' on the readthedocs page:** Go to the `Ray Readthedocs advanced settings page`_.
-    In 'Global Settings', set the 'Default Branch' to the *release branch*. This redirects the documentation to the latest pip release.
-    Message @richardliaw to add you if you don't have access.
+6. **Update 'Default Branch' on the readthedocs page:**
+   Go to the `Ray Readthedocs advanced settings page`_.
+   In 'Global Settings', set the 'Default Branch' to the *release branch*. This redirects the documentation to the latest pip release.
+   Message @richardliaw to add you if you don't have access.

-12. **Improve the release process:** Find some way to improve the release
-    process so that whoever manages the release next will have an easier time.
+   If, after completing this step, you still do not see the correct version
+   of the docs, trigger a new build of the "latest" branch in
+   readthedocs to see if that fixes it.

-.. _`sample PR for bumping a minor release version`: https://github.com/ray-project/ray/pull/6303
-.. _`sample commit for bumping the release branch version`: https://github.com/ray-project/ray/commit/a39325d818339970e51677708d5596f4b8f790ce
+7. **Update latest Docker Image:** Message Ian Rodney to bump the "latest" tag
+   in Dockerhub for the 
+   ``rayproject/ray`` and ``rayproject/ray-ml`` Docker images to point to the Docker images built from the release. (If you have privileges in these
+   docker projects, you can do this step yourself.)
+
+8. **Send out an email announcing the release** to the engineering@anyscale.com
+   Google group, and post a slack message in the Announcements channel of the
+   Ray slack (message a team lead if you do not have permissions.)
+
+9. **Improve the release process:** Find some way to improve the release
+   process so that whoever manages the release next will have an easier time.
+   If you had to make any changes to tests or cluster configurations, make
+   sure they are contributed back! If you've noticed anything in the docs that
+   was out-of-date, please patch them.
+
+**You're done! Congratulations and good job!**
+
+Resources and Troubleshooting
+-----------------------------
+**Link to latest wheel:**
+
+Assuming you followed the naming convention and have completed the step of
+updating the version on the release branch, you will be able to find wheels
+for your release at the following URL (with, e.g. VERSION=1.3.0): ``https://s3-us-west-2.amazonaws.com/ray-wheels/releases/<VERSION>/bfc8d1be43b86a9d3008aa07ca9f36664e02d1ba1/<VERSION>-cp37-cp37m-macosx_10_13_intel.whl``
+(Note, the exact URL varies a bit by python version and platform,
+this is for OSX on Python 3.7)
+
+**AWS link for all Ray wheels:**
+
+The AWS s3 file hierarchy for Ray wheels can be found `here <https://s3.console.aws.amazon.com/s3/buckets/ray-wheels/?region=us-west-2&tab=objects>`_
+in case you're having trouble with the above link.
+
+.. _`sample PR for bumping a minor release version`: https://github.com/ray-project/ray/pull/12856
+.. _`sample commit for bumping the release branch version`: https://github.com/ray-project/ray/pull/12856/
 .. _`GitHub release`: https://github.com/ray-project/ray/releases
 .. _`Ray Readthedocs version page`: https://readthedocs.org/projects/ray/versions/
 .. _`Ray Readthedocs advanced settings page`: https://readthedocs.org/dashboard/ray/advanced/
+.. _`Release Checklist`: https://github.com/ray-project/ray/release/RELEASE_CHECKLIST.md
+.. _`Releaser`: https://github.com/ray-project/releaser
--- a/release/long_running_tests/README.rst
+++ b/release/long_running_tests/README.rst
@ -13,7 +13,13 @@ Note that all the long running test is running inside virtual environment, tenso

 Running the Workloads
 ---------------------
-Easiest approach is to use the `Anyscale UI <https://www.anyscale.dev/>`. First run ``anyscale snapshot create`` from the command line to create a project snapshot. Then from the UI, you can launch an individual session and execute the run command for each test. 
+The easiest approach to running these workloads is to use the 
+`Releaser`_ tool to run them with the command
+``python cli.py suite:run long_running_tests``. By default, this
+will start a session to run each workload in the Anyscale product
+and kick them off.
+
+To run the tests manually, you can also use the `Anyscale UI <https://www.anyscale.dev/>`. First run ``anyscale snapshot create`` from the command line to create a project snapshot. Then from the UI, you can launch an individual session and execute the run command for each test. 

 You can also start the workloads using the CLI with:

@ -51,3 +57,5 @@ Adding a Workload

 To create a new workload, simply add a new Python file under ``workloads/`` and
 add the workload in the run command in `ray-project/project.yaml`.
+
+.. _`Releaser`: https://github.com/ray-project/releaser