SangBin Cho
140a180ebb
[xgboost] Fix flaky train_small test ( #20529 )
...
Xgboosts train_small timed out because of a CPU borrowing feature related to placement groups. The root bug will be fixed in the coming weeks, but this PR makes the release test consistently pass by requesting 0 CPUs for the remote wrapper script.
2021-11-18 10:20:08 +00:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests ( #20028 )
...
* wip
* add directory
* wip
* try again
* Revert "try again"
This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.
* finish
* formatting
* fix merge
* fix path
* chmod
* check
* sudo
* wip
* update
* fix horovod
* try
* typo
* reduce num workers
2021-11-05 17:28:37 -07:00
Kai Fricke
f96078687f
[xgboost/release] Xgboost/connect gpu test ( #19838 )
...
* [xgboost/release] Add GPU connect user test
* Use scaling cluster
* typo
* Increase xgboost placement group timeout
* Much higher timeout
* Move os environment timeout
* Move os environ
* [dev] install xgboost-ray from master
* GPU xgboost master
* Remove master install after new xgboost release
* Install latest
* Add master test
2021-11-02 08:40:48 -07:00
Antoni Baum
2c0dcec18f
[test] Fix golden notebook tests always failing ( #17873 )
2021-08-31 17:07:47 +02:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. ( #17885 )
...
* gcs_utils
* resource_spec
* profiling
* ray_perf and ray_cluster_perf
* test_utils
2021-08-18 20:56:33 -07:00
mwtian
7669708237
Create a wait_for_num_nodes() function, and use it in train_small
( #16784 )
2021-07-01 10:17:53 +01:00
mwtian
48599aef9e
Roll forward to run train_small in client mode. ( #16610 )
2021-06-23 08:52:08 +01:00
Kai Fricke
aecc4c8d28
[release] fix sgd base image, microbenchmark timeout, revert xgboost train_small to not use connect ( #16532 )
2021-06-18 11:40:04 +01:00
mwtian
2f7d535253
[Test] Use Ray client in XGBoost train_small release test ( #16319 )
2021-06-16 14:39:32 +01:00
Kai Fricke
153a8b8fec
[release] convert tune release tests ( #15913 )
2021-06-01 11:19:15 -07:00
Kai Fricke
8db2e5c23a
[release] Move xgboost tune small + microbenchmark release test to new release automation ( #15619 )
2021-05-08 20:38:39 +01:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates ( #15366 )
...
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Kai Fricke
7364a7a327
[tune] Move Optuna to ask(fixed_distributions) interface ( #14731 )
...
Adjusting to changes in Optuna 2.6.0. Old interface was marked as deprecated.
2021-03-22 12:25:37 +01:00
Ian Rodney
eb12033612
[Code Cleanup] Switch to use ray.util.get_node_ip_address() ( #14741 )
...
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2021-03-18 13:10:57 -07:00
Kai Fricke
a0f73cf3f7
[xgboost] Update XGBoost release test configs ( #13941 )
...
* Update XGBoost release test configs
* Use GPU containers
* Fix elastic check
* Use spot instances for GPU
* Add debugging output
* Fix success check, failure checking, outputs, sync behavior
* Update release checklist, rename mounts
2021-02-17 23:00:49 +01:00
Kai Fricke
8804758409
[xgboost] Add XGBoost release tests ( #13456 )
...
* Add XGBoost release tests
* Add more xgboost release tests
* Use failure state manager
* Add release test documentation
* Fix wording
* Automate fault tolerance tests
2021-01-20 18:40:23 +01:00