Commit graph

99 commits

Author SHA1 Message Date
matthewdeng
264e2df7e2
[release] update modin_xgboost_test to use anyscale connect (#16942) 2021-07-07 22:37:41 -07:00
SangBin Cho
33a2213c6f
Add another large scale shuffle test to verify stability (#16902) 2021-07-06 22:24:00 -07:00
Eric Liang
d956ca1b54
Add decision tree test to nightly builds (#16912) 2021-07-06 20:49:04 -07:00
matthewdeng
23088bd7ea
[release] update torch_tune_serve_test to use anyscale connect (#16754)
* [release] update torch_tune_serve_test to use anyscale connect

* use download_results to download model checkpoint

* clean up code to support both OSS and Anyscale
2021-07-06 19:02:50 -07:00
SangBin Cho
7bd3138227
[Test] Support stress test smoke test (#16827)
* Support smoke test

* lint
2021-07-02 09:50:26 -07:00
matthewdeng
a3f89d9f53
[release] write output for golden notebook tests (#16825) 2021-07-01 16:10:58 -07:00
Dmitri Gekhtman
096559f679
[release] Update instructions (minor) (#16812) 2021-07-01 09:42:51 -07:00
mwtian
7669708237
Create a wait_for_num_nodes() function, and use it in train_small (#16784) 2021-07-01 10:17:53 +01:00
Amog Kamsetty
c0560dadef
[Docker] Pin Tensorflow (#16741) 2021-06-29 11:14:46 -07:00
Dmitri Gekhtman
257d072d13
[kubernetes][release] K8s release test instructions (#16662) 2021-06-29 10:57:35 -07:00
matthewdeng
b0f304a1b5
[release] add golden notebook release test for torch/tune/serve (#16619)
* [release] add golden notebook release test for torch/tune/serve

* start serve on all nodes so remote localhost works
2021-06-29 09:13:23 -07:00
Jiao
6aeda62d40
[Serve] Add serve test config files and wrk dependency (#16631) 2021-06-28 10:01:55 -07:00
Chen Shen
c4d7b31a79
[Test] Placement group stress test (#16633) 2021-06-24 21:35:55 -07:00
Amog Kamsetty
53d16365b0
[Release] Convert Horovod and SGD release tests (#15999) 2021-06-24 15:56:02 +01:00
Kai Fricke
ef97bdd407
[release] Fix app config: Install latest releases. Bump xgboost-ray version (#16581) 2021-06-24 12:56:21 +01:00
mwtian
48599aef9e
Roll forward to run train_small in client mode. (#16610) 2021-06-23 08:52:08 +01:00
mwtian
f5f23448fc
Support downloading and testing wheels for Python 3.9. (#16586) 2021-06-21 12:02:22 -07:00
Chen Shen
853caea146
[tests]migrate test-many-tasks/test-dead-actors to nightly tests (#16469)
* init commit

* Update release/nightly_tests/nightly_tests.yaml

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

* Update release/nightly_tests/nightly_tests.yaml

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>

Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-18 18:43:25 -07:00
Kai Fricke
aecc4c8d28
[release] fix sgd base image, microbenchmark timeout, revert xgboost train_small to not use connect (#16532) 2021-06-18 11:40:04 +01:00
SangBin Cho
6dc4032d19
Set the 500GB block device for a single node test (#16493) 2021-06-16 22:37:30 -07:00
Kai Fricke
9352cb781c
[release tests] Fix microbenchmark base image, network overhead cluster wait time, add long running tests (#16355) 2021-06-16 21:37:17 +01:00
mwtian
2f7d535253
[Test] Use Ray client in XGBoost train_small release test (#16319) 2021-06-16 14:39:32 +01:00
Antoni Baum
2fb10e6730
[SGD] Add support for native Torch AMP in SGD (#16382)
* SGD native AMP initial commit

* SGD native amp second pass

* Update docs

* Update TorchTrainer doc

* Temp fix release test

* Update release/sgd_tests/sgd_gpu/sgd_gpu_app_config.yaml

Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
2021-06-15 17:48:21 -07:00
Amog Kamsetty
f3ad50fe6a
[SGD] Rename release tests (#16410)
Test failures unrelated
2021-06-15 17:16:40 +01:00
SangBin Cho
f3ab162c5e
Fix nightly release test issues. (#16419) 2021-06-15 00:43:08 -07:00
Eric Liang
f93ca2b673
Make it much simpler to turn on event stats (#16401) 2021-06-14 09:51:24 -07:00
SangBin Cho
eb7344069b
[Test] Improving tests (#16368)
* Improve testing

* Fix tsets.
2021-06-11 18:29:22 -07:00
matthewdeng
9c36ff81fa
[release] add golden notebook tests for dask/xgboost and modin/xgboost (#16231)
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
2021-06-11 10:03:04 +01:00
Eric Liang
ae0e38b86d
Remove legacy feature flags / features (#16349) 2021-06-10 09:31:38 -07:00
SangBin Cho
c8a5d7ba85
[TEST] Additional data processing nightly test (#16078)
* in progress

* in progress

* almost done

* Lint

* almost done

* All tests are available now

* Change the test a little more stressful

* Modify paramter to make tests a little more stressful
2021-06-09 22:38:53 -07:00
Clark Zinzow
ca68bf1e93
[Release] Update release test configs for 1.4 release. (#16292)
* Updated scalability envelope tests for 1.4.

* Update data processing release test for 1.4.
2021-06-08 00:15:25 -07:00
mwtian
c2a2a6f7c3
Make it easier to run asan and wheel release tests (#16242) 2021-06-07 22:54:22 -07:00
SangBin Cho
3572d0837e
[Test] Dask on ray sort nightly (#16213)
* Make dask on ray sort works

* lint

* revert unrelated change
2021-06-06 15:58:48 -07:00
SangBin Cho
03c33cf443
add a streaming shuffl etest (#16258) 2021-06-06 15:58:14 -07:00
Clark Zinzow
227f252c39
[Release] Release 1.4.0 stress tests, scalability envelope, and microbenchmark release logs (#16228) 2021-06-04 16:36:41 -07:00
Kai Fricke
153a8b8fec
[release] convert tune release tests (#15913) 2021-06-01 11:19:15 -07:00
Sven Mika
c9d220bcda
[RLlib] Upgrade RLlib regression test scripts to new testing tool - RLlib release logs for 1.4. (#16080) 2021-06-01 17:39:18 +02:00
Amog Kamsetty
da6f28d777
[Release] Add multi-node, multi-GPU SGD release test (#16046) 2021-05-31 16:23:04 -07:00
SangBin Cho
9fa3b9f6f3
[Nightly test] Test non streaming shuffle (#16150) 2021-05-31 15:28:02 -07:00
SangBin Cho
94dc06d852
[Nightly test] improve error detection (#16102)
* improve error detection

* improve gitignore

* fix
2021-05-27 00:33:21 -07:00
SangBin Cho
ee1ccb569d
[Test] Nightly shuffle test (#15998)
* shuffle daily test update.

* lint

* Improve testing.

* Download the real nightly.

* Addressed code review.

* fix typo

* fix issue

* fix the broken release test

* Updated the test.
2021-05-24 15:33:31 -07:00
mwtian
5462c6e7de
Fix link to release checklist from release process doc. (#15793) 2021-05-13 13:34:54 -07:00
SangBin Cho
259fcbd5bd
[Pubsub] Generalize the pubsub interface and adapt it for ref counting protocol (#15446)
* Add mock code first

* In the initial progress.

* Fix the number error

* In progress.

* in more pgoress.

* in progress.

* lint.

* Prototype done.

* Fix compilation bug.

* Now it is working with reference counting.

* Remove template.

* lint.

* Fixed issues.

* Fix reference count test.

* Reference count test passes now.

* Fixed the test array problem

* Addressed code review.

* lint.

* Addressed half of code review.

* Fix tests.

* Addressed the most critical issue.

* Make subscriber thread-safe.

* Revert "Make subscriber thread-safe."

This reverts commit 9a6a52197cfa8463ab60dfaae9530ad3c0ed8790.

* Fixed test failures. The only failure now is the asan failure.

* Reset test suites and see if it fixes the issue.

* Fix a flaky test

* Addressed code review.
2021-05-13 09:29:02 -07:00
Eric Liang
0dfd43c61b
Add nightly release test directory and add shuffle release test (#15671)
* update

* udpate

* update

* update

* update

* Adjust script/release test json

* remove

* update

* lint

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-05-08 14:21:55 -07:00
Kai Fricke
8db2e5c23a
[release] Move xgboost tune small + microbenchmark release test to new release automation (#15619) 2021-05-08 20:38:39 +01:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates (#15366)
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Jenna Kwon
15da948214
Support object spilling mode and data load failure mode in dask_on_ra… (#15601)
* Support object spilling mode and data load failure mode in dask_on_ray_large_scale_test.py

* Remove freq and time decimation

Co-authored-by: Jenna Kwon <jkkwon@amazon.com>
2021-05-04 10:57:49 -07:00
Amog Kamsetty
ebc44c3d76
[CI] Upgrade flake8 to 3.9.1 (#15527)
* formatting

* format util

* format release

* format rllib/agents

* format rllib/env

* format rllib/execution

* format rllib/evaluation

* format rllib/examples

* format rllib/policy

* format rllib utils and tests

* format streaming

* more formatting

* update requirements files

* fix rllib type checking

* updates

* update

* fix circular import

* Update python/ray/tests/test_runtime_env.py

* noqa
2021-05-03 14:23:28 -07:00
SangBin Cho
df9329160e
[Tests] Dask on ray release test (#15256)
* done.

* Linting.

* Update readme

* Update.

* Fix issues.
2021-04-15 10:30:17 -07:00
SangBin Cho
d0e83c43ca
[Release Test] Modify parameter to reduce stress (#15048)
* Fix.

* Fix.
2021-04-14 18:27:20 -07:00