From ee7a4587624d8388e5235efa4f1709f0655876f0 Mon Sep 17 00:00:00 2001 From: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Date: Thu, 3 Mar 2022 08:24:26 -0800 Subject: [PATCH] [release test] fix horovod release test. (#22781) horovod_user_test_master is failing with recent horovod release[[link](https://buildkite.com/ray-project/periodic-ci/builds/2960#61dabda8-eea0-4b7b-93bf-9e341926d3fd)]. Error message is saying: ``` AttributeError: Can't get attribute '_ExecutorDriver' on ``` The horovod test is set up in such a way that it has the "driver" (a.k.a. client) part (which is the code that runs in a buildkite agent) and the "cluster" (a.k.a. server) part (which runs in Anyscale cluster). Driver's dependency is specified by `release/ml_user_tests/horovod/driver_setup_master.sh` while cluster's dependency is specified by `release/horovod_tests/app_config_master.yaml`. The two communicate via Anyscale client. The above error message is complaining that while client's horovod version has _ExecutorDriver in runner.py, the server's horovod doesn't. This is due to the version mismatch of the above two files. This PR brings the two horovod dependency to both point to horovod master. --- release/horovod_tests/app_config_master.yaml | 2 +- release/ml_user_tests/horovod/driver_setup_master.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/release/horovod_tests/app_config_master.yaml b/release/horovod_tests/app_config_master.yaml index 2197c7d86..e5364acbf 100644 --- a/release/horovod_tests/app_config_master.yaml +++ b/release/horovod_tests/app_config_master.yaml @@ -13,5 +13,5 @@ post_build_cmds: - pip3 uninstall ray -y || true - pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }} - pip3 install 'ray[tune]' - - HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip3 install -U horovod + - HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip3 install -U git+https://github.com/horovod/horovod.git - {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }} diff --git a/release/ml_user_tests/horovod/driver_setup_master.sh b/release/ml_user_tests/horovod/driver_setup_master.sh index 6b8699eb6..98fae8712 100755 --- a/release/ml_user_tests/horovod/driver_setup_master.sh +++ b/release/ml_user_tests/horovod/driver_setup_master.sh @@ -6,4 +6,4 @@ pip install cmake pip install -U -r ./driver_requirements.txt -HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip install git+https://github.com/horovod/horovod.git@06aa579c9966035453f92208706157dee14c14ab \ No newline at end of file +HOROVOD_WITH_GLOO=1 HOROVOD_WITHOUT_MPI=1 HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITHOUT_MXNET=1 HOROVOD_WITH_PYTORCH=1 pip install -U git+https://github.com/horovod/horovod.git \ No newline at end of file