Commit graph

12 commits

Author SHA1 Message Date
Kai Fricke
b58f839534
[ci/release] Remove hard numpy removal from app configs (#21005) 2021-12-13 15:22:02 +00:00
Amog Kamsetty
99ed623371
[Release] Use NCCL backend for release tests (#20677)
* use nccl for release tests

* link issue
2021-11-29 12:42:13 -08:00
Antoni Baum
a8d7897a56
[CI] Modify remote wrapper in XGBoost-Ray client test (#20544)
Instead of wrapping the whole training run in a remote call, we only query the files on the node in a remote call. XGBoost-Ray is then started from the local node.
2021-11-24 10:27:17 +00:00
Richard Liaw
1cadd61917
Fix horovod failing tests by pinning down (#20484) 2021-11-17 13:54:25 -08:00
Amog Kamsetty
7e597814aa
[Release] Fix app config for horovod_tests (#20393)
Fixes `horovod_test` weekly test

Closes https://github.com/ray-project/ray/issues/20382
2021-11-16 09:06:42 -08:00
Kai Fricke
91920f1d02
[release/xgboost] xgboost release test fixes via app config (#20325)
* [xgboost] Fix release test app configs

* Revert full app config

* Update base docker image

* Only change cpu base image

* default

* Pin xgboost to 1.5. in cpu tests

* Remove numpy hack

* Revert one line

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-15 10:03:21 -08:00
matthewdeng
ed3cbe48f5
[train][xgboost][release] fix ml_user_tests using ray client (#20345) 2021-11-15 15:24:23 +00:00
matthewdeng
e22632dabc
[train] wrap BackendExecutor in ray.remote() (#20123)
* [train] wrap BackendExecutor in ray.remote()

* wip

* fix trainer tests

* move CheckpointManager to Trainer

* [tune] move force_on_current_node to ml_utils

* fix import

* force on head node

* init ray

* split test files

* update example

* move tests to ray client

* address comments

* move comment

* address comments
2021-11-13 15:30:44 -08:00
Amog Kamsetty
4396419a64
[Release] Fix tune_rllib connect test (#20321)
* [Release] Fix tune_rllib connect test

* use canonical app config
2021-11-13 10:11:20 -08:00
Amog Kamsetty
18dcf1ac25
[Release] Use nightly Docker images (#20001)
* use nightly

* switch ml cpu to ray cpu

* fix

* add pytest

* add more pytest

* add constraint

* add tensorflow

* fix merge conflict

* add tblib

* fix

* add back uninstall
2021-11-10 18:00:16 -08:00
Amog Kamsetty
f164f3a8b5
[Release] Increase Placement Group timeout (#20224) 2021-11-10 13:02:38 -08:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests (#20028)
* wip

* add directory

* wip

* try again

* Revert "try again"

This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.

* finish

* formatting

* fix merge

* fix path

* chmod

* check

* sudo

* wip

* update

* fix horovod

* try

* typo

* reduce num workers
2021-11-05 17:28:37 -07:00