Chen Shen
d856abb70d
[Test] increase memory for 5000 partitions shuffle ( #17429 )
2021-07-29 21:56:16 -07:00
Kai Fricke
2ae6b944a2
[release tests] limit number of results fetched for alerting ( #17430 )
2021-07-29 18:43:44 +01:00
Jiao
3dc49c0b79
[serve] Add multi deployment to serve nightly tests ( #17411 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-29 11:47:58 -05:00
SangBin Cho
65c0c3f3a4
[Test] Fix releaser bug ( #17418 )
...
* Fix a bug
* done
2021-07-28 18:15:00 -07:00
Jiao
2618236167
[serve] Fix single deployment nightly test ( #17368 )
2021-07-28 11:38:06 -05:00
SangBin Cho
e1cd8580a0
[Test] Add various fixes to the nightly dashboard to improve signals ( #17351 )
...
* Add various fixes to the nightly dashboard to improve signals
* Fix issues
2021-07-27 12:37:11 -07:00
Jiao
9eb1bcd061
[serve] Multi & single deployment large scale test ( #17310 )
2021-07-27 10:46:45 -05:00
Edward Oakes
58423e6018
[serve] Improve nightly release test ( #17277 )
2021-07-26 11:15:46 -05:00
Jiao
9b6be6f1c8
update dask compatibility for 1.5.0 ( #17302 )
...
* update dask compatibility for 1.5.0
* change to right file
* add pip install pytest
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-23 17:31:42 -07:00
Jiao
f4f702c595
[Release] change default expiration to 2 days in order to prevent custodian kill it early morning ( #17215 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-20 17:03:14 -07:00
Jiao
7473f663ef
[Release] change replica to 100 to collect signals now ( #17214 )
...
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-20 12:27:56 -07:00
Jiao
994ff3ce21
[Serve] Add initial large scale tests ( #17026 )
2021-07-20 08:56:29 -07:00
Antoni Baum
5e9b680e39
[docs] Add LightGBM-Ray docs, update XGBoost-Ray docs ( #17188 )
2021-07-20 16:06:47 +01:00
Chen Shen
fe9a6b669c
[nightly-test] add 4-nodes shuffle-data-loader test ( #17155 )
2021-07-19 17:46:22 -07:00
SangBin Cho
561dcbd99c
[Test] Fix the permission issue for Dask on Ray multi node sort #17189( #17189 )
2021-07-19 14:42:39 -07:00
Alex Wu
93c16346bf
[Dataset] imagenet nightly test ( #17069 )
2021-07-16 14:15:49 -07:00
SangBin Cho
ef1d9278b8
[Test] nightly test dask on ray multi node sort ( #17141 )
2021-07-15 23:13:35 -07:00
Chen Shen
2a53d22438
[nightly-test] add test shuffle_data_loader ( #16972 )
...
* test shuffle_data_loader
* address comments
* update
2021-07-15 20:03:35 -07:00
Kai Fricke
ed131f87da
[release] move release testing end to end script to main ray repo ( #17070 )
2021-07-14 12:39:07 -07:00
Antoni Baum
cfc5806c2d
[release] LightGBM release tests ( #17043 )
2021-07-14 08:38:55 +01:00
SangBin Cho
536537cd1a
[Test] Update large scale data processing tests ( #16967 )
...
* in progress
* in progress
2021-07-13 19:15:13 -07:00
Eric Liang
fa0ff057d6
Add a new autoscaling shuffle test ( #16948 )
2021-07-12 16:54:38 -07:00
Chen Shen
667f53a0a2
add stress test ( #16977 )
2021-07-11 09:59:41 -07:00
SangBin Cho
33e319e9d7
[Tests] Remove app level error from nightly tests ( #16968 )
...
* Completed
* Fix tests
* increase the node wait timeout
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
2021-07-09 12:20:42 -07:00
matthewdeng
264e2df7e2
[release] update modin_xgboost_test to use anyscale connect ( #16942 )
2021-07-07 22:37:41 -07:00
SangBin Cho
33a2213c6f
Add another large scale shuffle test to verify stability ( #16902 )
2021-07-06 22:24:00 -07:00
Eric Liang
d956ca1b54
Add decision tree test to nightly builds ( #16912 )
2021-07-06 20:49:04 -07:00
matthewdeng
23088bd7ea
[release] update torch_tune_serve_test to use anyscale connect ( #16754 )
...
* [release] update torch_tune_serve_test to use anyscale connect
* use download_results to download model checkpoint
* clean up code to support both OSS and Anyscale
2021-07-06 19:02:50 -07:00
SangBin Cho
7bd3138227
[Test] Support stress test smoke test ( #16827 )
...
* Support smoke test
* lint
2021-07-02 09:50:26 -07:00
matthewdeng
a3f89d9f53
[release] write output for golden notebook tests ( #16825 )
2021-07-01 16:10:58 -07:00
Dmitri Gekhtman
096559f679
[release] Update instructions (minor) ( #16812 )
2021-07-01 09:42:51 -07:00
mwtian
7669708237
Create a wait_for_num_nodes() function, and use it in train_small
( #16784 )
2021-07-01 10:17:53 +01:00
Amog Kamsetty
c0560dadef
[Docker] Pin Tensorflow ( #16741 )
2021-06-29 11:14:46 -07:00
Dmitri Gekhtman
257d072d13
[kubernetes][release] K8s release test instructions ( #16662 )
2021-06-29 10:57:35 -07:00
matthewdeng
b0f304a1b5
[release] add golden notebook release test for torch/tune/serve ( #16619 )
...
* [release] add golden notebook release test for torch/tune/serve
* start serve on all nodes so remote localhost works
2021-06-29 09:13:23 -07:00
Jiao
6aeda62d40
[Serve] Add serve test config files and wrk dependency ( #16631 )
2021-06-28 10:01:55 -07:00
Chen Shen
c4d7b31a79
[Test] Placement group stress test ( #16633 )
2021-06-24 21:35:55 -07:00
Amog Kamsetty
53d16365b0
[Release] Convert Horovod and SGD release tests ( #15999 )
2021-06-24 15:56:02 +01:00
Kai Fricke
ef97bdd407
[release] Fix app config: Install latest releases. Bump xgboost-ray version ( #16581 )
2021-06-24 12:56:21 +01:00
mwtian
48599aef9e
Roll forward to run train_small in client mode. ( #16610 )
2021-06-23 08:52:08 +01:00
mwtian
f5f23448fc
Support downloading and testing wheels for Python 3.9. ( #16586 )
2021-06-21 12:02:22 -07:00
Chen Shen
853caea146
[tests]migrate test-many-tasks/test-dead-actors to nightly tests ( #16469 )
...
* init commit
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
* Update release/nightly_tests/nightly_tests.yaml
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-18 18:43:25 -07:00
Kai Fricke
aecc4c8d28
[release] fix sgd base image, microbenchmark timeout, revert xgboost train_small to not use connect ( #16532 )
2021-06-18 11:40:04 +01:00
SangBin Cho
6dc4032d19
Set the 500GB block device for a single node test ( #16493 )
2021-06-16 22:37:30 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests ( #16355 )
2021-06-16 21:37:17 +01:00
mwtian
2f7d535253
[Test] Use Ray client in XGBoost train_small release test ( #16319 )
2021-06-16 14:39:32 +01:00
Antoni Baum
2fb10e6730
[SGD] Add support for native Torch AMP in SGD ( #16382 )
...
* SGD native AMP initial commit
* SGD native amp second pass
* Update docs
* Update TorchTrainer doc
* Temp fix release test
* Update release/sgd_tests/sgd_gpu/sgd_gpu_app_config.yaml
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-06-15 17:48:21 -07:00
Amog Kamsetty
f3ad50fe6a
[SGD] Rename release tests ( #16410 )
...
Test failures unrelated
2021-06-15 17:16:40 +01:00
SangBin Cho
f3ab162c5e
Fix nightly release test issues. ( #16419 )
2021-06-15 00:43:08 -07:00
Eric Liang
f93ca2b673
Make it much simpler to turn on event stats ( #16401 )
2021-06-14 09:51:24 -07:00