ray/ci
mwtian 7013b32d15
[Release] prefer last cluster env version in release tests (#24950)
Currently the release test runner prefers the first successfully version of a cluster env, instead of the last version. But sometimes a cluster env may build successfully on Anyscale but cannot launch cluster successfully (e.g. version 2 here) or new dependencies need to be installed, so a new version needs to be built. The existing logic always picks up the 1st successful build and cannot pick up the new cluster env version.

Although this is an edge case (tweaking cluster env versions, with the same Ray wheel or cluster env name), I believe it is possible for others to run into it.

Also, avoid running most of the CI tests for changes under release/ray_release/.
2022-05-24 13:26:54 +01:00
..
build [ci/py310] Fix docker image build/tag (#24922) 2022-05-18 18:36:37 +01:00
env [WIP] Run minimal tests against all supported python version (#24830) 2022-05-18 09:42:26 -07:00
k8s [ci] Clean up ci/ directory (refactor ci/travis) (#23866) 2022-04-13 18:11:30 +01:00
lint Annotate datasources and add API annotation check script (#24999) 2022-05-21 15:05:07 -07:00
pipeline [Release] prefer last cluster env version in release tests (#24950) 2022-05-24 13:26:54 +01:00
run [ci] Add short failing test summary for pytests (#24104) 2022-04-26 22:18:07 +01:00
ci.sh Annotate datasources and add API annotation check script (#24999) 2022-05-21 15:05:07 -07:00
keep_alive [ray] Update cpp to std14 (#14441) 2021-03-10 14:05:52 -08:00
README.md [CI] Create zip of ray session_latest/logs dir on test failure and upload to buildkite via /artifact-mount (#23783) 2022-04-22 09:48:53 +01:00
remote-watch.py [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
repro-ci.py [CI] Format Python code with Black (#21975) 2022-01-29 18:41:57 -08:00
suppress_output Shellcheck rewrites (#9597) 2020-07-24 17:24:19 -05:00

CI process

This document is a work-in-progress. Please double-check file/function/etc. names for changes, as this document may be out of sync.

Dependencies

All dependencies (e.g. apt, pip) should be installed in install_dependencies(), following the same pattern as those that already exist.

Once a dependency is added/removed, please ensure that shell environment variables are persisted appropriately, as CI systems differ on when ~/.bashrc et al. are reloaded, if at all. (And they are not necessarily idempotent.)

Bazel, environment variables, and caching

Any environment variables passed to Bazel actions (e.g. PATH) should be idempotent to hit the Bazel cache.

If a different PATH gets passed to a Bazel action, Bazel will not hit the cache, and you might trigger a full rebuild when you really expect an incremental (or no-op) build for an option (say pip install -e . after bazel build //...).

Invocation

The CI system (such as Travis) must source (not execute) ci/ci.sh and pass the action(s) to execute. The script either handles the work or dispatches it to other script(s) as it deems appropriate. This helps ensure any environment setup/teardown is handled appropriately.

Development best practices & pitfalls (read before adding a new script)

Before adding new scripts, please read this section.

First, please consider modifying an existing script instead (e.g. add your code as a separate function). Adding new scripts has a number of pitfalls that easily take hours (even days) to track down and fix:

  • When calling other scripts (as executables), environment variables (like PATH) cannot propagate back up to the caller. Often, the caller expects such variables to be updated.

  • When sourcing other scripts, global state (ROOT_DIR, main, set -e, etc.) may be overwritten silently, causing unexpected behavior.

The following practices can avoid such pitfalls while maintaining intuitive control flow:

  • Put all environment-modifying functions in the top-level shell script, so that their invocation behaves intuitively. (The sheer length of the script is a secondary concern and can be mitigated by keeping functions modular.)

  • Avoid adding new scripts if possible. If it's necessary that you do so, call them instead of sourcing them. Note that this implies new scripts should not modify the environment, or the caller will not see such changes!

  • Always add code inside a function, not at global scope. Use local for variables where it makes sense. However, be careful and know the shell rules: for example, e.g. local x=$(false) succeeds even under set -e.

Ultimately, it's best to only add new scripts if they might need to be executed directly by non-CI code, as in that case, they should probably not use CI entrypoints (which assume exclusive control over the machine).