OSS release tests currently run with hardcoded Python 3.7 base. In the future we will want to run tests on different python versions.
This PR adds support for a new `python` field in the test configuration. The python field will determine both the base image used in the Buildkite runner docker container (for Ray client compatibility) and the base image for the Anyscale cluster environments.
Note that in Buildkite, we will still only wait for the python 3.7 base image before kicking off tests. That is acceptable, as we can assume that most wheels finish in a similar time, so even if we wait for the 3.7 image and kick off a 3.8 test, that runner will wait maybe for 5-10 more minutes.
#22749 broke release unit tests by not providing a legacy key - that key should be optional because we will b dealing with non-legacy tests soon.
Additionally, for some reason the unit tests pass on buildkite while they fail locally and in the release test pipeline. I'm investigating this now...
Of all smoke test arguments, frequency is the only required one, so we should check for it. Additionally, not all fields should be able to be overwritten (e.g. legacy or name), so we enforce this as well.
To avoid breakage like in #22905, this PR adds schema validation to the release test package.
In a follow-up PR, we'll likely switch this to use pydantic instead.