mirror of
https://github.com/vale981/ray
synced 2025-03-06 02:21:39 -05:00

horovod_user_test_master is failing with recent horovod release[[link](https://buildkite.com/ray-project/periodic-ci/builds/2960#61dabda8-eea0-4b7b-93bf-9e341926d3fd)]. Error message is saying: ``` AttributeError: Can't get attribute '_ExecutorDriver' on <module 'horovod.ray.runner' from '/home/ray/anaconda3/lib/python3.7/site-packages/horovod/ray/runner.py'> ``` The horovod test is set up in such a way that it has the "driver" (a.k.a. client) part (which is the code that runs in a buildkite agent) and the "cluster" (a.k.a. server) part (which runs in Anyscale cluster). Driver's dependency is specified by `release/ml_user_tests/horovod/driver_setup_master.sh` while cluster's dependency is specified by `release/horovod_tests/app_config_master.yaml`. The two communicate via Anyscale client. The above error message is complaining that while client's horovod version has _ExecutorDriver in runner.py, the server's horovod doesn't. This is due to the version mismatch of the above two files. This PR brings the two horovod dependency to both point to horovod master.
17 lines
567 B
YAML
17 lines
567 B
YAML
base_image: "anyscale/ray-ml:nightly-py37-gpu"
|
|
env_vars: {}
|
|
debian_packages:
|
|
- curl
|
|
|
|
python:
|
|
pip_packages:
|
|
- pytest
|
|
- awscli
|
|
conda_packages: []
|
|
|
|
post_build_cmds:
|
|
- pip3 uninstall ray -y || true
|
|
- pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }}
|
|
- pip3 install 'ray[tune]'
|
|
- HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip3 install -U git+https://github.com/horovod/horovod.git
|
|
- {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }}
|