Commit graph

287 commits

Author SHA1 Message Date
Amog Kamsetty
3a52187da8
[Release/Lightning] Add Ray lightning user test (#19812)
* wip

* wip

* add ray lightning test

* fix

* update

* merge and add

* fix

* fix

* rename

* autoscale

* add tblib

* gloo backend

* typo

* upgrade torch

* latest and master
2021-11-01 18:29:48 -07:00
Amog Kamsetty
474e44f7e0
[Release/Horovod] Add user test for Horovod (#19661)
* infra

* wip

* add test

* typo

* typo

* update

* rename

* fix

* full path

* formatting

* reorder

* update

* update

* Update release/horovod_tests/workloads/horovod_user_test.py

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>

* bump num_workers

* update installs

* try

* add pip_packages

* min_workers

* fix

* bump pg timeout

* Fix symlink

* fix

* fix

* cmake

* fix

* pin filelock

* final

* update

* fix

* Update release/horovod_tests/workloads/horovod_user_test.py

* fix

* fix

* separate compute template

* test latest and master

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
2021-11-01 18:28:07 -07:00
matthewdeng
e1e4a45b8d
[train] add simple Ray Train release tests (#19817)
* [train] add simple Ray Train release tests

* simplify tests

* update

* driver requirements

* move to test

* remove connect

* fix

* fix

* fix torch

* gpu

* add assert

* remove assert

* use gloo backend

* fix

* finish

Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>
2021-11-01 18:25:19 -07:00
xwjiang2010
1803ca13b6
Adding release logs for 1.8.0. (#19867) 2021-11-01 10:26:04 -07:00
architkulkarni
702bffe072
[runtime env] [test] Enable runtime env nightly test with working_dir reconnection (#19906) 2021-10-31 10:48:48 -05:00
xwjiang2010
4d293c4cee
Increase horovod_test disk space. (#19917) 2021-10-30 14:41:31 -07:00
Lixin Wei
1fe9f3372e
[Nightly Test] Remove duplicate printing code (#19874)
## Why are these changes needed?

Remove duplicate printing code
2021-10-29 10:19:19 -07:00
Kai Fricke
fa0158abe5
[tune] Cloud checkpointing release tests (#19638) 2021-10-29 12:12:01 +02:00
Kai Fricke
a13f738a10
[ci/release] Fix cloud search query (#19876) 2021-10-29 11:30:34 +02:00
Kai Fricke
564d8551ed
[ci/release] only check alert if test succeeded before (#19857) 2021-10-28 16:09:10 -07:00
Simon Mo
3e038aebb2
[CI] Allow release tests infra to accept buildkite artifacts (#19803) 2021-10-27 13:04:01 -07:00
Yi Cheng
abec07700a
[nightly] Adding more tests related to grpc broadcasting to staging mode (#19779)
## Why are these changes needed?
We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging.

## Related issue number
#19438
2021-10-27 10:46:13 -07:00
Jiao
3f628d4f6b
increase long poll timeout and wrk trial cpu resource (#19768) 2021-10-26 21:31:39 -07:00
SangBin Cho
bcd27b708f
[Test] Mark many ppo as unstable (#19769) 2021-10-26 21:27:43 -07:00
xwjiang2010
ab15dfd478
[Tune release test] Set 500G disk space for rllib_tests. (#19730) 2021-10-26 10:12:03 -07:00
Jiao
aaef82920d
[serve] Add periodic timeouts to long poll client to avoid accumulating concurrent tasks in the controller (#19728) 2021-10-26 09:44:00 -05:00
Kai Fricke
98244ad130
[ci/release] Report error to database on alert (#19743) 2021-10-26 10:48:02 +01:00
Kai Fricke
96ddf5b9ac
[ci/release] Choose cloud by name or ID (#19742) 2021-10-26 10:21:54 +01:00
Amog Kamsetty
6e61ca623d
[CI] Infra for "user" tests (#19662) 2021-10-26 08:47:22 +01:00
SangBin Cho
ecd5a622ef
[Tests] Add a memory usage on dask on ray tests (#19674) 2021-10-25 14:58:26 -07:00
architkulkarni
414910b7fc
[test] [runtime env] Add release test with Ray Client and local pip files (#19026) 2021-10-25 11:49:27 -05:00
xwjiang2010
a632cb439f
[Tune] Remove queue_trials. (#19472) 2021-10-22 09:24:54 +01:00
SangBin Cho
9000f41aa6
[Nightly Test] Support memory profiling on Ray + implement memory monitor for nightly tests (#19539)
* random fixes

* Done

* done

* update the doc

* doc lint fix

* .

* .
2021-10-21 07:37:05 -07:00
Yi Cheng
7a7b356899
[Nightly test] add test for grpc broadcasting (#19579) 2021-10-21 07:01:41 -07:00
Kai Fricke
71564040ec
[ci/release] Unwrap after installing pip packages (#19552) 2021-10-20 13:41:16 +01:00
Yi Cheng
01b899dafb
[nightly] Fix broken test due to bad syntax #19536 (#19536) 2021-10-19 21:43:46 -07:00
Yi Cheng
7a9cedfc5c
[nightly] Add grpc based broadcasting into nightly test for decision_tree (#19531)
* dbg

* up

* check

* up

* up

* put grpc based one into nightly test

* up
2021-10-19 19:59:39 -07:00
Kai Fricke
3e8587644b
[ci/release] wrap all release test pip github installs in quotation marks (#19521) 2021-10-19 20:55:02 +01:00
Chen Shen
b38ebd368c
[Dataset][nighlyt-test] spend less money #19488
Reduce the epoch and ensure everything runs in the same datacenter.
2021-10-18 18:53:50 -07:00
gjoliver
e9f66cc394
Reduce success criteria for a few learning tests. (#19484) 2021-10-18 15:44:38 -07:00
Jiajun Yao
4d9585773f
[Release] Remove release process doc (#19312) 2021-10-18 11:24:03 -07:00
Yi Cheng
f47f69d31e
[nightly] Add decision_tree_autoscaling_20_runs to nightly test 2021-10-18 11:19:40 -07:00
Kai Fricke
ad94eb03c6
[ci/release] wrap pip github installs in quotation marks to prevent comment errors (#19464) 2021-10-18 18:55:56 +01:00
Kai Fricke
eee05505b1
[ci/release] Add separate timeout parameter for prepare commands (#19459) 2021-10-18 16:29:25 +01:00
Kai Fricke
57fe405120
[ci/release] Bump long running release test timeouts to 6 minutes (#19458) 2021-10-18 16:27:53 +01:00
Chen Shen
9dba5e0ead
[dataset][nightly-test] fix pipeline ingest test (#19437) 2021-10-18 11:31:24 +01:00
Kai Fricke
6c6639a0d7
[ci/release] hotfix for undefined local variable (#19460) 2021-10-18 11:28:33 +01:00
matthewdeng
caa42d753c
[release] pin modin>=0.11.0 due to ray.services being removed (#19446) 2021-10-18 11:23:05 +01:00
Kai Fricke
c10d434713
[release] Allow commit hashes instead of URLs, add bisection utility (#19398) 2021-10-18 10:44:29 +01:00
Kai Fricke
e17b23fa5b
[ci/release] Add support for RAY_WHEELS url (#19364) 2021-10-14 21:40:01 +01:00
Kai Fricke
e07d0953ea
[ci/release] Undo faulty change to many_ppo num_samples (#19388) 2021-10-14 10:27:31 -07:00
Antoni Baum
e9df253f5d
[CI/docs] Remove [default] from xgboost-ray (#19186)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-14 16:29:55 +01:00
Kai Fricke
9cee83c919
[tune] PBT: Add burn-in period (#19321) 2021-10-14 16:28:29 +01:00
Carlo Grisetti
5cee8a1985
[release tests] Switch from yaml.load to yaml.safe_load (#19365) 2021-10-13 17:27:25 -07:00
Yi Cheng
1dc03cd49d
[nightly] Put many nodes actor test back (#19313)
## Why are these changes needed?
There are two issues fixed in this PR:
- make sure wait for session count alive node
- upgrade the machine to match what's tested in oss ray.

## Related issue number
https://github.com/ray-project/ray/issues/19084
2021-10-13 15:51:12 -07:00
matthewdeng
d998373968
[release] fix test by pinning filelock (#19334)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-13 22:27:04 +01:00
Jiao
893f76daf9
[serve] Add serve FT nightly test to buildkite (#19361) 2021-10-13 13:56:55 -07:00
Jiao
85b8a6de5f
[Serve] Add nightly test for Serve failure recovery (#19125) 2021-10-11 18:33:20 -07:00
SangBin Cho
dd1c1f9787
[Nightly test] remove env vars from tests (#19221)
When testing it we should minimize unnecessary env vars (and it's better working with the default config). This PR removes unnecessary env vars that are set.
2021-10-08 06:53:23 -07:00
Clark Zinzow
ca731d7c86
[Datasets] Fix API breakage in Datasets nightly test. 2021-10-07 15:07:19 -07:00