Revert back to using nightly base images instead of pinning to 1.12.1. Pinning the docker image had led to uncaught errors in the past. Instead, we should be using nightly to make sure release tests will work on the most up to date versions of docker/cluster envs. If there are any test failures, the underlying issues should be fixed rather than pinning the docker image.
Co-authored-by: Kai Fricke <kai@anyscale.com>
Many release tests are currently failing for cuda version incompatibilities. Pinning the base image to 1.12.1 seems to resolve the problem for the time being.
OSS release tests currently run with hardcoded Python 3.7 base. In the future we will want to run tests on different python versions.
This PR adds support for a new `python` field in the test configuration. The python field will determine both the base image used in the Buildkite runner docker container (for Ray client compatibility) and the base image for the Anyscale cluster environments.
Note that in Buildkite, we will still only wait for the python 3.7 base image before kicking off tests. That is acceptable, as we can assume that most wheels finish in a similar time, so even if we wait for the 3.7 image and kick off a 3.8 test, that runner will wait maybe for 5-10 more minutes.