mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00
[Release] Update Release Process Documentation (#13123)
This commit is contained in:
parent
d632b0f0f7
commit
d018212db5
3 changed files with 274 additions and 57 deletions
86
release/RELEASE_CHECKLIST.md
Normal file
86
release/RELEASE_CHECKLIST.md
Normal file
|
@ -0,0 +1,86 @@
|
|||
# Release Checklist
|
||||
This checklist is meant to be used in conjunction with the RELEASE_PROCESS.rst document.
|
||||
|
||||
## Initial Steps
|
||||
- [ ] Called for release blockers
|
||||
- [ ] Messaged Ant about release blockers
|
||||
- [ ] Announced branch cut date and estimated release date
|
||||
|
||||
## Branch Cut
|
||||
- [ ] Release branch created
|
||||
- [ ] PR created to update “latest” version on master (do not merge yet)
|
||||
- [ ] Release branch versions updated
|
||||
- [ ] Version keys have new version
|
||||
- [ ] Update of “Latest” commits cherry-picked into release branch
|
||||
- [ ] Release commits pulled into spreadsheet
|
||||
- [ ] Release notes doc created
|
||||
- [ ] Call for release notes made in Slack
|
||||
|
||||
## Release Testing
|
||||
- [ ] Microbenchmark
|
||||
- [ ] Test passing
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] Long Running Tests (mark complete when run 24 hrs no issues)
|
||||
- [ ] actor_deaths
|
||||
- [ ] apex
|
||||
- [ ] impala
|
||||
- [ ] many_actor_tasks
|
||||
- [ ] many_drivers
|
||||
- [ ] many_ppo
|
||||
- [ ] many_tasks_serialized_ids
|
||||
- [ ] many_tasks
|
||||
- [ ] node_failures
|
||||
- [ ] pbt
|
||||
- [ ] serve_failure
|
||||
- [ ] serve
|
||||
- [ ] Long Running Distributed Tests
|
||||
- [ ] pytorch_pbt_failure
|
||||
- [ ] horovod_test
|
||||
- [ ] Stress Tests
|
||||
- [ ] test_dead_actors
|
||||
- [ ] succeeds
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] test_many_tasks
|
||||
- [ ] succeeds
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] test_placement_group
|
||||
- [ ] succeeds
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] RLlib Tests
|
||||
- [ ] regression_tests
|
||||
- [ ] compact-regression-tests-tf
|
||||
- [ ] succeeds
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] compact-regression-tests-torch
|
||||
- [ ] succeeds
|
||||
- [ ] Results added to `release/release_logs`
|
||||
- [ ] stress_tests
|
||||
- [ ] unit_gpu_tests
|
||||
|
||||
## Final Steps
|
||||
- [ ] Wheels uploaded to Test PyPI
|
||||
- [ ] Wheel sanity checks with Test PyPI
|
||||
- [ ] Windows
|
||||
- [ ] Python 3.6
|
||||
- [ ] Python 3.7
|
||||
- [ ] Python 3.8
|
||||
- [ ] OSX
|
||||
- [ ] Python 3.6
|
||||
- [ ] Python 3.7
|
||||
- [ ] Python 3.8
|
||||
- [ ] Linux
|
||||
- [ ] Python 3.6
|
||||
- [ ] Python 3.7
|
||||
- [ ] Python 3.8
|
||||
- [ ] Release is created on Github with release notes
|
||||
- [ ] Release includes contributors
|
||||
- [ ] Release notes sent for review to team leads
|
||||
- [ ] Release is published
|
||||
- [ ] Wheels uploaded to production PyPI
|
||||
- [ ] Installing latest with `pip install -U ray` reveals correct version number and commit hash
|
||||
- [ ] “Latest” docs point to new release version
|
||||
- [ ] Docker image latest is updated on dockerhub
|
||||
- [ ] PR to bump master version is merged
|
||||
- [ ] Release is announced internally
|
||||
- [ ] Release is announced externally
|
||||
- [ ] Any code/doc changes made during the release process contributed back to master branch
|
|
@ -1,45 +1,126 @@
|
|||
Release Process
|
||||
===============
|
||||
|
||||
This document describes the process for creating new releases.
|
||||
The following documents the Ray release process. Please use the
|
||||
`Release Checklist`_ to keep track of your progress, as it is meant
|
||||
to be used alongside this process document. Also, please keep the
|
||||
team up-to-date on any major regressions or changes to the timeline
|
||||
via emails to the engineering@anyscale.com Google Group.
|
||||
|
||||
Before Branch Cut
|
||||
-----------------
|
||||
1. **Create a document to track release-blocking commits.** These may be pull
|
||||
requests that are not ready at the time of branch cut, or they may be
|
||||
fixes for issues that you encounter during release testing later.
|
||||
The only PRs that should be considered release-blocking are those which
|
||||
fix a MAJOR REGRESSION (P0) or deliver an absolutely critical piece of
|
||||
functionality that has been promised for the release (though this should
|
||||
be avoided where possible).
|
||||
You may make a copy of the following `template <https://docs.google.com/spreadsheets/d/1qeOYErAn3BzGgtEilBePjN6tavdbabCEEqglDsjrq1g/edit#gid=0>`_.
|
||||
|
||||
Make sure to share this document with major contributors who may have release blockers.
|
||||
|
||||
2. **Announce the release** over email to the engineering@anyscale.com mailing
|
||||
group. The announcement should
|
||||
contain at least the following information: the release version,
|
||||
the date when branch-cut will occur, the date the release is expected
|
||||
to go out (generally a week or so after branch cut depending on how
|
||||
testing goes), and a link to the document for tracking release blockers.
|
||||
|
||||
After Branch Cut
|
||||
----------------
|
||||
1. **Create a release branch:** Create the branch from the desired commit on master
|
||||
In order to create the branch, locally checkout the commit ID i.e.,
|
||||
``git checkout <hash>``. Then checkout a new branch of the format
|
||||
``releases/<release-version>``. Then push that branch to the ray repo:
|
||||
``releases/<release-version>`` (e.g. ``releases/1.3.1``). Then push that branch to the ray repo:
|
||||
``git push upstream releases/<release-version>``.
|
||||
|
||||
2. **Update the release branch version:** Push a commit directly to the
|
||||
newly-created release branch that increments the Python package version in
|
||||
python/ray/__init__.py and src/ray/raylet/main.cc. See this
|
||||
``python/ray/__init__.py``, ``src/ray/raylet/main.cc``, and any other files that use ``ray::stats::VersionKey``. See this
|
||||
`sample commit for bumping the release branch version`_.
|
||||
|
||||
3. **Update the master branch version:**
|
||||
|
||||
For a new minor release (e.g., 0.7.0): Create a pull request to
|
||||
3. **Create PR to update the master branch version:**
|
||||
For a new minor release (e.g., ``0.6.3 -> 0.7.0``): Create a pull request to
|
||||
increment the dev version in of the master branch. See this
|
||||
`sample PR for bumping a minor release version`_. **NOTE:** Not all of
|
||||
the version numbers should be replaced. For example, ``0.7.0`` appears in
|
||||
this file but should not be updated.
|
||||
You will want to replace all instances of nightly wheel links (wheels with
|
||||
dev in their name), instances of ``ray::stats::VersionKey``,
|
||||
and instances of ``__version__``.
|
||||
|
||||
For a new micro release (e.g., 0.7.1): No action is required.
|
||||
TIP: search the code base for instances of ``https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-1.1.0.dev0`` (but replace ray-1.1.0 with the current version) to find links to the nightly wheel to update.
|
||||
|
||||
4. **Testing:** Before releasing, the following sets of tests should be run.
|
||||
The results of each of these tests for previous releases are checked in
|
||||
under ``release_logs``, and should be compared against to identify
|
||||
any regressions.
|
||||
You should also cherry pick parts of this pull request referencing the
|
||||
latest wheels onto the release branch prior
|
||||
to making the release. This will ensure that the ``install-nightly`` of
|
||||
the new release correctly points at the latest version on master.
|
||||
|
||||
1. Long-running tests
|
||||
Note: Doing this will cause the ``install-nightly`` CLI command of the
|
||||
current Ray release to fail, so you should only merge this PR when
|
||||
the release is about to occur. (For instance, the 1.0.0 ``install-nightly``
|
||||
command points to 1.1.0.dev0, so when you bump master to 1.2.0.dev0, it no
|
||||
longer finds the latest commit.)
|
||||
|
||||
For a new micro release (e.g., ``0.7.0 -> 0.7.1``): No action is required.
|
||||
|
||||
4. **Create a document to collect release-notes:** You can clone `this document <https://docs.google.com/document/d/1vzcNHulHCrq1PrXWkGBwwtOK53vY2-Ol8SXbnvKPw1s/edit?usp=sharing>`_.
|
||||
|
||||
You will also need to create a spreadsheet with information about the PRs
|
||||
included in the release to jog people's memories. You can collect this
|
||||
information by running
|
||||
.. code-block:: bash
|
||||
git log --date=local --pretty=format:"%h%x09%an%x09%ad%x09%s" releases/1.0.1..releases/1.1.0 > release-commits.tsv
|
||||
|
||||
Then, upload this tsv file to Google sheets
|
||||
and sort by description.
|
||||
|
||||
Ask team leads to contribute notes for their teams' projects. Include both
|
||||
the spreadsheet and document in your message.
|
||||
(Some people to message are Richard Liaw, Eric Liang, Edward
|
||||
Oakes, Simon Mo, Sven Mika, and Ameer Haj Ali. Please tag these people in the
|
||||
document or @mention them in your release announcement.)
|
||||
|
||||
|
||||
Release Testing
|
||||
---------------
|
||||
Before each release, we run the following tests to make sure that there are
|
||||
no major functionality OR performance regressions. You should start running
|
||||
these tests right after branch cut in order to identify any regressions early.
|
||||
The `Releaser`_ tool is used to run release tests in the Anyscale product, and
|
||||
is generally the easiest way to run release tests.
|
||||
|
||||
|
||||
1. **Microbenchmark**
|
||||
|
||||
This is a simple test of Ray functionality and performance
|
||||
across several dimensions. You can run it locally with the release commit
|
||||
installed using the command ``ray microbenchmark`` for a quick sanity check.
|
||||
|
||||
However, for the official results, you will need to run the
|
||||
microbenchmark in the same setting as previous runs--on an `m4.16xl` instance running `Ubuntu 18.04` with `Python 3`
|
||||
You can do this using the `Releaser`_ tool mentioned above, or
|
||||
manually by running ``ray up ray/release/microbenchmark/cluster.yaml``
|
||||
followed by ``ray exec ray/release/microbenchmark/cluster.yaml 'ray microbenchmark'``
|
||||
|
||||
The results should be checked in under ``release_logs/<version>/microbenchmark.txt``.
|
||||
|
||||
You can also get the performance change rate from the previous version using
|
||||
``util/microbenchmark_analysis.py``.
|
||||
|
||||
2. **Long-running tests**
|
||||
|
||||
These tests should run for at least 24 hours without erroring or hanging (ensure that it is printing new iterations and CPU load is
|
||||
stable in the AWS console or in the Anyscale Product's Grafana integration).
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
long_running_tests/README.rst
|
||||
long_running_tests/README.rst
|
||||
|
||||
Follow the instructions to kick off the tests and check the status of the workloads.
|
||||
These tests should run for at least 24 hours without erroring or hanging (ensure that it is printing new iterations and CPU load is
|
||||
stable in the AWS console).
|
||||
|
||||
2. Long-running multi-node tests
|
||||
3. **Long-running multi-node tests**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
@ -54,54 +135,57 @@ This document describes the process for creating new releases.
|
|||
(and not increasing or decreasing over time, which indicates a leak). You can see the head node
|
||||
and worker node CPU utilizations in the AWS console.
|
||||
|
||||
3. Multi-node regression tests
|
||||
4. **Multi-node regression tests**
|
||||
|
||||
Follow the same instruction as long running stress tests. The large scale distributed
|
||||
regression tests identify potential performance regression in distributed environment.
|
||||
The following test should be ran:
|
||||
The following test should be run, and can be run with the `Releaser`_ tool
|
||||
like other tests:
|
||||
|
||||
- ``rllib_tests/regression_tests`` run the compact regression test for rllib.
|
||||
- ``rllib_tests/stress_tests`` run multinode 8hr IMPALA trial.
|
||||
- ``stress_tests`` contains two tests: ``many_tasks`` and ``dead_actors``.
|
||||
Each of the test runs on 105 spot instances.
|
||||
Each of the test runs on 105 spot instances.
|
||||
- ``stress_tests/workloads/placement_group`` contains a Python script to run tests.
|
||||
It currently uses ``cluster_util`` to emulate the cluster testing. It will be converted to
|
||||
real multi-node tests in the future. For now, just make sure the test succeed locally.
|
||||
It currently uses ``cluster_util`` to emulate the cluster testing. It will be converted to
|
||||
real multi-node tests in the future. For now, just make sure the test succeed locally.
|
||||
|
||||
Make sure that these pass. For the RLlib regression tests, there shouldn't be any errors
|
||||
and the rewards should be similar to previous releases. For the rest, it will be obvious if
|
||||
they passed. This will use the autoscaler to start a bunch of machines and run some tests.
|
||||
they passed, as they will output metrics about their execution times and results that can be compared to previous releases.
|
||||
|
||||
**IMPORTANT**: You must get signoff from the RLlib team for the RLlib test results.
|
||||
|
||||
The summaries printed by each test should be checked in under
|
||||
``release_logs/<version>`` on the **master** branch (make a pull request).
|
||||
|
||||
4. Microbenchmarks
|
||||
|
||||
Run the ``microbenchmark`` with the commit. Under the hood, the session will
|
||||
run `ray microbenchmark` on an `m4.16xl` instance running `Ubuntu 18.04` with `Python 3`
|
||||
to get the latest microbenchmark numbers.
|
||||
|
||||
The results should be checked in under ``release_logs/<version>``.
|
||||
|
||||
You can also get the performance change rate from the previous version using
|
||||
``util/microbenchmark_analysis.py``.
|
||||
|
||||
5. ASAN tests
|
||||
5. **ASAN tests**
|
||||
|
||||
Run the ``ci/asan_tests`` with the commit. This will enable ASAN build and run the
|
||||
whole Python tests to detect memory leaks.
|
||||
|
||||
5. **Resolve release-blockers:** If a release blocking issue arises, there are
|
||||
two ways the issue can be resolved: 1) Fix the issue on the master branch and
|
||||
cherry-pick the relevant commit (using ``git cherry-pick``) onto the release
|
||||
branch (recommended). 2) Revert the commit that introduced the bug on the
|
||||
Identify and Resolve Release Blockers
|
||||
-------------------------------------
|
||||
If a release blocking issue arises in the course of testing, you should
|
||||
reach out to the team to which the issue corresponds. They should either
|
||||
work on a fix immediately or tell you which changes ought to be reverted.
|
||||
|
||||
There are two ways the issue can be resolved:
|
||||
|
||||
1. Fix the issue on the master branch and
|
||||
cherry-pick the relevant commit (using ``git cherry-pick``) onto the release
|
||||
branch (recommended).
|
||||
2. Revert the commit that introduced the bug on the
|
||||
release branch (using ``git revert``), but not on the master (not recommended).
|
||||
|
||||
These changes should then be pushed directly to the release branch.
|
||||
These changes should then be pushed directly to the release branch.
|
||||
|
||||
6. **Create a GitHub release:** Create a `GitHub release`_. This should include
|
||||
Once Release Blockers are Resolved
|
||||
----------------------------------
|
||||
After all release blockers are resolved and testing complete, you are ready
|
||||
to proceed with the final stages of the release!
|
||||
|
||||
1. **Create a GitHub release:** Create a `GitHub release`_. This should include
|
||||
**release notes**. Copy the style and formatting used by previous releases.
|
||||
Create a draft of the release notes containing information about substantial
|
||||
changes/updates/bugfixes and their PR numbers. Once you have a draft, send it
|
||||
|
@ -125,7 +209,7 @@ This document describes the process for creating new releases.
|
|||
--prev-release-commit="<COMMIT_SHA>" \
|
||||
--curr-release-commit="<COMMIT_SHA>"
|
||||
|
||||
7. **Download all the wheels:** Now the release is ready to begin final
|
||||
2. **Download all the wheels:** Now the release is ready to begin final
|
||||
testing. The wheels are automatically uploaded to S3, even on the release
|
||||
branch. To download them, use ``util/download_wheels.sh``:
|
||||
|
||||
|
@ -137,7 +221,7 @@ This document describes the process for creating new releases.
|
|||
|
||||
This can be tested if you use the script source ./bin/download_wheels.sh
|
||||
|
||||
8. **Upload to PyPI Test:** Upload the wheels to the PyPI test site using
|
||||
3. **Upload to PyPI Test:** Upload the wheels to the PyPI test site using
|
||||
``twine``.
|
||||
|
||||
.. code-block:: bash
|
||||
|
@ -171,11 +255,14 @@ This document describes the process for creating new releases.
|
|||
This will download the ray from the test pypi repository and run the minimum
|
||||
sanity check from all the Python version supported. (3.6, 3.7, 3.8)
|
||||
|
||||
Windows sanity check test is currently not automated.
|
||||
The Windows sanity check test is currently not automated.
|
||||
You can start a Windows
|
||||
VM in the AWS console running the Deep Learning AMI, then install the correct
|
||||
version of Ray using the Anaconda prompt.
|
||||
|
||||
9. **Upload to PyPI:** Now that you've tested the wheels on the PyPI test
|
||||
repository, they can be uploaded to the main PyPI repository. Be careful,
|
||||
**it will not be possible to modify wheels once you upload them**, so any
|
||||
4. **Upload to PyPI:** Now that you've tested the wheels on the PyPI test
|
||||
repository, they can be uploaded to the main PyPI repository. **Be careful,
|
||||
it will not be possible to modify wheels once you upload them**, so any
|
||||
mistake will require a new release.
|
||||
|
||||
.. code-block:: bash
|
||||
|
@ -191,19 +278,55 @@ This document describes the process for creating new releases.
|
|||
|
||||
pip install -U ray
|
||||
|
||||
10. **Create a point release on readthedocs page:** Go to the `Ray Readthedocs version page`_.
|
||||
Scroll to "Activate a version" and mark the *release branch* as "active" and "public". This creates a point release for the documentation.
|
||||
Message @richardliaw to add you if you don't have access.
|
||||
5. **Create a point release on readthedocs page:** Go to the `Ray Readthedocs version page`_.
|
||||
Scroll to "Activate a version" and mark the *release branch* as "active" and "public". This creates a point release for the documentation.
|
||||
Message @richardliaw to add you if you don't have access.
|
||||
|
||||
11. **Update 'Default Branch' on the readthedocs page:** Go to the `Ray Readthedocs advanced settings page`_.
|
||||
In 'Global Settings', set the 'Default Branch' to the *release branch*. This redirects the documentation to the latest pip release.
|
||||
Message @richardliaw to add you if you don't have access.
|
||||
6. **Update 'Default Branch' on the readthedocs page:**
|
||||
Go to the `Ray Readthedocs advanced settings page`_.
|
||||
In 'Global Settings', set the 'Default Branch' to the *release branch*. This redirects the documentation to the latest pip release.
|
||||
Message @richardliaw to add you if you don't have access.
|
||||
|
||||
12. **Improve the release process:** Find some way to improve the release
|
||||
process so that whoever manages the release next will have an easier time.
|
||||
If, after completing this step, you still do not see the correct version
|
||||
of the docs, trigger a new build of the "latest" branch in
|
||||
readthedocs to see if that fixes it.
|
||||
|
||||
.. _`sample PR for bumping a minor release version`: https://github.com/ray-project/ray/pull/6303
|
||||
.. _`sample commit for bumping the release branch version`: https://github.com/ray-project/ray/commit/a39325d818339970e51677708d5596f4b8f790ce
|
||||
7. **Update latest Docker Image:** Message Ian Rodney to bump the "latest" tag
|
||||
in Dockerhub for the
|
||||
``rayproject/ray`` and ``rayproject/ray-ml`` Docker images to point to the Docker images built from the release. (If you have privileges in these
|
||||
docker projects, you can do this step yourself.)
|
||||
|
||||
8. **Send out an email announcing the release** to the engineering@anyscale.com
|
||||
Google group, and post a slack message in the Announcements channel of the
|
||||
Ray slack (message a team lead if you do not have permissions.)
|
||||
|
||||
9. **Improve the release process:** Find some way to improve the release
|
||||
process so that whoever manages the release next will have an easier time.
|
||||
If you had to make any changes to tests or cluster configurations, make
|
||||
sure they are contributed back! If you've noticed anything in the docs that
|
||||
was out-of-date, please patch them.
|
||||
|
||||
**You're done! Congratulations and good job!**
|
||||
|
||||
Resources and Troubleshooting
|
||||
-----------------------------
|
||||
**Link to latest wheel:**
|
||||
|
||||
Assuming you followed the naming convention and have completed the step of
|
||||
updating the version on the release branch, you will be able to find wheels
|
||||
for your release at the following URL (with, e.g. VERSION=1.3.0): ``https://s3-us-west-2.amazonaws.com/ray-wheels/releases/<VERSION>/bfc8d1be43b86a9d3008aa07ca9f36664e02d1ba1/<VERSION>-cp37-cp37m-macosx_10_13_intel.whl``
|
||||
(Note, the exact URL varies a bit by python version and platform,
|
||||
this is for OSX on Python 3.7)
|
||||
|
||||
**AWS link for all Ray wheels:**
|
||||
|
||||
The AWS s3 file hierarchy for Ray wheels can be found `here <https://s3.console.aws.amazon.com/s3/buckets/ray-wheels/?region=us-west-2&tab=objects>`_
|
||||
in case you're having trouble with the above link.
|
||||
|
||||
.. _`sample PR for bumping a minor release version`: https://github.com/ray-project/ray/pull/12856
|
||||
.. _`sample commit for bumping the release branch version`: https://github.com/ray-project/ray/pull/12856/
|
||||
.. _`GitHub release`: https://github.com/ray-project/ray/releases
|
||||
.. _`Ray Readthedocs version page`: https://readthedocs.org/projects/ray/versions/
|
||||
.. _`Ray Readthedocs advanced settings page`: https://readthedocs.org/dashboard/ray/advanced/
|
||||
.. _`Release Checklist`: https://github.com/ray-project/ray/release/RELEASE_CHECKLIST.md
|
||||
.. _`Releaser`: https://github.com/ray-project/releaser
|
||||
|
|
|
@ -13,7 +13,13 @@ Note that all the long running test is running inside virtual environment, tenso
|
|||
|
||||
Running the Workloads
|
||||
---------------------
|
||||
Easiest approach is to use the `Anyscale UI <https://www.anyscale.dev/>`. First run ``anyscale snapshot create`` from the command line to create a project snapshot. Then from the UI, you can launch an individual session and execute the run command for each test.
|
||||
The easiest approach to running these workloads is to use the
|
||||
`Releaser`_ tool to run them with the command
|
||||
``python cli.py suite:run long_running_tests``. By default, this
|
||||
will start a session to run each workload in the Anyscale product
|
||||
and kick them off.
|
||||
|
||||
To run the tests manually, you can also use the `Anyscale UI <https://www.anyscale.dev/>`. First run ``anyscale snapshot create`` from the command line to create a project snapshot. Then from the UI, you can launch an individual session and execute the run command for each test.
|
||||
|
||||
You can also start the workloads using the CLI with:
|
||||
|
||||
|
@ -51,3 +57,5 @@ Adding a Workload
|
|||
|
||||
To create a new workload, simply add a new Python file under ``workloads/`` and
|
||||
add the workload in the run command in `ray-project/project.yaml`.
|
||||
|
||||
.. _`Releaser`: https://github.com/ray-project/releaser
|
Loading…
Add table
Reference in a new issue