Kai Fricke
b58f839534
[ci/release] Remove hard numpy removal from app configs ( #21005 )
2021-12-13 15:22:02 +00:00
Amog Kamsetty
99ed623371
[Release] Use NCCL backend for release tests ( #20677 )
...
* use nccl for release tests
* link issue
2021-11-29 12:42:13 -08:00
Antoni Baum
a8d7897a56
[CI] Modify remote wrapper in XGBoost-Ray client test ( #20544 )
...
Instead of wrapping the whole training run in a remote call, we only query the files on the node in a remote call. XGBoost-Ray is then started from the local node.
2021-11-24 10:27:17 +00:00
Richard Liaw
1cadd61917
Fix horovod failing tests by pinning down ( #20484 )
2021-11-17 13:54:25 -08:00
Amog Kamsetty
7e597814aa
[Release] Fix app config for horovod_tests
( #20393 )
...
Fixes `horovod_test` weekly test
Closes https://github.com/ray-project/ray/issues/20382
2021-11-16 09:06:42 -08:00
Kai Fricke
91920f1d02
[release/xgboost] xgboost release test fixes via app config ( #20325 )
...
* [xgboost] Fix release test app configs
* Revert full app config
* Update base docker image
* Only change cpu base image
* default
* Pin xgboost to 1.5. in cpu tests
* Remove numpy hack
* Revert one line
Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-15 10:03:21 -08:00
matthewdeng
ed3cbe48f5
[train][xgboost][release] fix ml_user_tests using ray client ( #20345 )
2021-11-15 15:24:23 +00:00
matthewdeng
e22632dabc
[train] wrap BackendExecutor in ray.remote() ( #20123 )
...
* [train] wrap BackendExecutor in ray.remote()
* wip
* fix trainer tests
* move CheckpointManager to Trainer
* [tune] move force_on_current_node to ml_utils
* fix import
* force on head node
* init ray
* split test files
* update example
* move tests to ray client
* address comments
* move comment
* address comments
2021-11-13 15:30:44 -08:00
Amog Kamsetty
4396419a64
[Release] Fix tune_rllib connect test ( #20321 )
...
* [Release] Fix tune_rllib connect test
* use canonical app config
2021-11-13 10:11:20 -08:00
Amog Kamsetty
18dcf1ac25
[Release] Use nightly Docker images ( #20001 )
...
* use nightly
* switch ml cpu to ray cpu
* fix
* add pytest
* add more pytest
* add constraint
* add tensorflow
* fix merge conflict
* add tblib
* fix
* add back uninstall
2021-11-10 18:00:16 -08:00
Amog Kamsetty
f164f3a8b5
[Release] Increase Placement Group timeout ( #20224 )
2021-11-10 13:02:38 -08:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests ( #20028 )
...
* wip
* add directory
* wip
* try again
* Revert "try again"
This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.
* finish
* formatting
* fix merge
* fix path
* chmod
* check
* sudo
* wip
* update
* fix horovod
* try
* typo
* reduce num workers
2021-11-05 17:28:37 -07:00