Root cause:
https://www.shell-tips.com/bash/source-dot-command/#gsc.tab=0
Using . will execute the command in the "current shell" in a bash script. It looks like removing . command from ci.sh init means that we will lose the set -eo command used within ci.sh init applied to next test running commands because set -eo is called within a child process, not the current shell (so the future command won't have the set -eo configured).
Recently there have been a number of CI test failures due to direct or transitive dependency version upgrades. Printing out environment information for each test suite allows us to quickly check the diff between failed and successful runs.
**Notes:**
1. In this PR I just manually added `./ci/env/env_info.sh` to each test suite. We may want to generalize this in the future.
2. This is just for CI now, but is applicable to release tests as well.
Signed-off-by: Matthew Deng <matt@anyscale.com>
- Stop using dot command to run ci.sh script: it doesn't fail the build if the command fails for windows and is generally dangerous since it will make unexpected changes to the current shell.
- Fix uncovered windows build issues.
Since ray supports Redis as a storage backend, we should ensure the code path with Redis as storage is still being covered e2e.
The tests don't run for a while after we switch to memory mode by default. This PR tries to fix this and make it run with every commit.
In the future, if we support more and more storage backends, this should be revised to be more efficient and selective. But now I think the cost should be ok.
This PR is part of GCS HA testing-related work.
Currently all jobs that build wheels put them into the artifacts directory and upload them. This leads to the wheels being overwritten on S3 multiple times. This is not a huge problem as ingress is free, but in order to have a single point of reference, it might be beneficial to limit the wheels uploading to a single Buildkite job. Recently, this has led to interference with stale artifact directories.
The downside here is that if the "Wheels & Jars" build fails randomly, the wheels will not be available on S3 - previously they've been also uploaded by several other jobs.
Clean up the ci/ directory. This means getting rid of the travis/ path completely and moving the files into sensible subdirectories.
Details:
- Moves everything under ci/travis into subdirectories, e.g. ci/build, ci/lint, etc.
- Minor adjustments to some scripts (variable renames)
- Removes the outdated (unused) asan tests
There are problems with running C++ tests in MacOS 10.15 Catalina, when upgrading to the newest grpc due to dynamic linking: #22384 (comment). The problem does not exist for Python tests in Catalina, or in C++ tests of other systems.
Upgrading MacOS CI from Catalina is also blocked in the short term: ray-project/buildkite-ci-stack#24 (comment)
So working around the issue by using static linking for C++ tests on Mac.