ray/release/horovod_tests/app_config_master.yaml
xwjiang2010 ee7a458762
[release test] fix horovod release test. (#22781)
horovod_user_test_master is failing with recent horovod release[[link](https://buildkite.com/ray-project/periodic-ci/builds/2960#61dabda8-eea0-4b7b-93bf-9e341926d3fd)]. 
Error message is saying:
```
AttributeError: Can't get attribute '_ExecutorDriver' on <module 'horovod.ray.runner' from '/home/ray/anaconda3/lib/python3.7/site-packages/horovod/ray/runner.py'>
```
The horovod test is set up in such a way that it has the "driver" (a.k.a. client) part (which is the code that runs in a buildkite agent) and the "cluster" (a.k.a. server) part (which runs in Anyscale cluster). Driver's dependency is specified by `release/ml_user_tests/horovod/driver_setup_master.sh` while cluster's dependency is specified by `release/horovod_tests/app_config_master.yaml`.

The two communicate via Anyscale client. 
The above error message is complaining that while client's horovod version has _ExecutorDriver in runner.py, the server's horovod doesn't. This is due to the version mismatch of the above two files. This PR brings the two horovod dependency to both point to horovod master.
2022-03-03 08:24:26 -08:00

17 lines
567 B
YAML

base_image: "anyscale/ray-ml:nightly-py37-gpu"
env_vars: {}
debian_packages:
- curl
python:
pip_packages:
- pytest
- awscli
conda_packages: []
post_build_cmds:
- pip3 uninstall ray -y || true
- pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }}
- pip3 install 'ray[tune]'
- HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip3 install -U git+https://github.com/horovod/horovod.git
- {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }}