matthewdeng
ed3cbe48f5
[train][xgboost][release] fix ml_user_tests using ray client ( #20345 )
2021-11-15 15:24:23 +00:00
matthewdeng
e22632dabc
[train] wrap BackendExecutor in ray.remote() ( #20123 )
...
* [train] wrap BackendExecutor in ray.remote()
* wip
* fix trainer tests
* move CheckpointManager to Trainer
* [tune] move force_on_current_node to ml_utils
* fix import
* force on head node
* init ray
* split test files
* update example
* move tests to ray client
* address comments
* move comment
* address comments
2021-11-13 15:30:44 -08:00
Amog Kamsetty
4396419a64
[Release] Fix tune_rllib connect test ( #20321 )
...
* [Release] Fix tune_rllib connect test
* use canonical app config
2021-11-13 10:11:20 -08:00
Amog Kamsetty
18dcf1ac25
[Release] Use nightly Docker images ( #20001 )
...
* use nightly
* switch ml cpu to ray cpu
* fix
* add pytest
* add more pytest
* add constraint
* add tensorflow
* fix merge conflict
* add tblib
* fix
* add back uninstall
2021-11-10 18:00:16 -08:00
Amog Kamsetty
f164f3a8b5
[Release] Increase Placement Group timeout ( #20224 )
2021-11-10 13:02:38 -08:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests ( #20028 )
...
* wip
* add directory
* wip
* try again
* Revert "try again"
This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.
* finish
* formatting
* fix merge
* fix path
* chmod
* check
* sudo
* wip
* update
* fix horovod
* try
* typo
* reduce num workers
2021-11-05 17:28:37 -07:00