It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later
OSS release tests currently run with hardcoded Python 3.7 base. In the future we will want to run tests on different python versions.
This PR adds support for a new `python` field in the test configuration. The python field will determine both the base image used in the Buildkite runner docker container (for Ray client compatibility) and the base image for the Anyscale cluster environments.
Note that in Buildkite, we will still only wait for the python 3.7 base image before kicking off tests. That is acceptable, as we can assume that most wheels finish in a similar time, so even if we wait for the 3.7 image and kick off a 3.8 test, that runner will wait maybe for 5-10 more minutes.
RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.
* use nightly
* switch ml cpu to ray cpu
* fix
* add pytest
* add more pytest
* add constraint
* add tensorflow
* fix merge conflict
* add tblib
* fix
* add back uninstall
## Why are these changes needed?
There are two issues fixed in this PR:
- make sure wait for session count alive node
- upgrade the machine to match what's tested in oss ray.
## Related issue number
https://github.com/ray-project/ray/issues/19084
When testing it we should minimize unnecessary env vars (and it's better working with the default config). This PR removes unnecessary env vars that are set.