Commit graph

70 commits

Author SHA1 Message Date
Jiajun Yao
04a1a19f6b
[Release Test] Send release test result to db pipeline (#22667)
Send release test result to db pipeline
Add perf metrics for microbenchmark so that we can alert on them
2022-03-02 06:19:31 -08:00
SangBin Cho
5e847f7e09
[Usage Stats] Usage stats only enabled on nightly test infra (#22591)
This PR **enables the usage stats only on the release test infrastructure** (large scale tests Ray runs on a daily basis in a private infra). Note it is still disabled by default in Ray.
2022-02-23 22:11:48 -08:00
Stephanie Wang
03a5589591
[core] Enable lineage reconstruction in CI (#21519)
Enables lineage reconstruction in all CI and release tests.
2022-02-18 11:04:20 -08:00
Jun Gong
a9147bb62c
[Release Test] Fix AnyscaleSDK construction so we can run CI on staging instance. (#22325) 2022-02-16 09:56:02 -08:00
SangBin Cho
2ed5bb7a5f
[Nightly Test] Addressed client failure properly (#22438)
When the client returns the code that's not 0, we should raise RuntimeError to properly propagate errors
2022-02-16 09:03:17 -08:00
Yi Cheng
8b1bbfe8e4
[e2e] Fix an error when "env_vars" is not set. (#22234)
To fix error in session https://buildkite.com/ray-project/periodic-ci/builds/2699#c532ed2b-ee89-48ad-a7db-fd4211ef8bd9
2022-02-08 22:05:53 -08:00
Yi Cheng
d8ac01bd5c
[e2e] Update e2e test to use redisless ray by default. (#22189)
As title, after infra got updated, we need to merge the PR so that test can run ray without redis.
2022-02-08 19:46:48 -08:00
SangBin Cho
ac00389cbe
[Nightly test] Bring back the old way of running commands. (#22209)
Bring back the old way of running commands for non-k8s tests.

This also fixes the regression from many_drivers.py
2022-02-08 01:44:07 -08:00
Kai Fricke
dd935874ee
[ci/release] Fix job submission command (#22093)
Ray job submission does not accept quoted commands anymore (#22011). This PR updates the command to fix job submission within e2e tests.
2022-02-04 00:05:52 +01:00
mwtian
b528bf9202
Revert "[e2e] Remove unnecessary logic around copying results (#22034)" (#22088)
This reverts commit 92d7e9bf98.
2022-02-03 13:42:40 -08:00
mwtian
92d7e9bf98
[e2e] Remove unnecessary logic around copying results (#22034)
After #21905, some of the logic around handling result artifacts become unnecessary or incorrect (in generating error logs). They are removed.
2022-02-03 12:15:06 -08:00
SangBin Cho
3c056a6b92
Revert "[Nightly Test] Add more metadata to test result (#21990)" (#22052)
This reverts commit fd20cf3239.
2022-02-02 12:56:42 -08:00
SangBin Cho
fd20cf3239
[Nightly Test] Add more metadata to test result (#21990)
Add a columns, error code, commit url, stable, session url, and runtime
2022-01-31 22:33:30 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
mwtian
634f897cb6
[e2e] improve output dir handling (#21906)
Try to clear the result dir before running the e2e.py script, to avoid failures where the directory already exists, or a file cannot be overwritten due to permission issue.
2022-01-26 23:56:08 -08:00
mwtian
1674a17e6f
[e2e] use alternative copy tree function to tolerate output directory that already exists (#21869)
Many release tests have error messages when copying results with `shutil.copytree()`. e.g.
https://buildkite.com/ray-project/periodic-ci/builds/2511#131c0d22-61a3-4dcf-b80a-de37b68ec591/139-450

This PR tries to make the copying process tolerate existing destination directory. There is logic to remove the destination directory, but I'm not sure why it failed.

This error should not be failing the tests though.
2022-01-26 05:10:22 -08:00
Clark Zinzow
2cd3045b16
[Test Infra] Fix e2e.py help info for --report (#21757)
This momentarily confused me as to whether --report would enable or disable reporting.
2022-01-21 03:29:50 -08:00
SangBin Cho
b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00
Kai Fricke
e233f8172d
[ci/release] Terminate session on session startup timeout (#21703)
When a session startup times out due to resources not being available, the session may still come up after that timeout. At that time the control script (e2e.py) is already terminated, so the session runs until the autosuspend limit is hit, incurring unnecessary costs. Instead, we should always trigger session termination on session timeout.
2022-01-19 10:01:03 -08:00
Kai Fricke
0e9e8824e4
[ci/release] use s3 sync (#21626)
Previous changes failed because a) permission errors b) unzip being unavailable at remote nodes. Instead we are using tar gzip archives now.

This reverts commit 42bcab27e8.
2022-01-15 17:53:19 -08:00
Kai Fricke
42bcab27e8
Revert "[Release Test] Opt-in tests to use K8s based cloud. (#21583)" (#21605)
This reverts commit 0d5fbcc7bb.
2022-01-14 11:46:52 -08:00
Simon Mo
0d5fbcc7bb
[Release Test] Opt-in tests to use K8s based cloud. (#21583) 2022-01-13 17:20:36 -08:00
Kai Fricke
aa35045b6f
[ci/release] Update to recent anyscale API changes (#21149)
Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.
2022-01-04 11:21:47 +00:00
mwtian
0b3fed5ef3
Revert "[Nightly Test] Add a team column to each test config. (#21198)" (#21289)
This reverts commit b5b11b2d06.
2021-12-30 06:44:51 +09:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
architkulkarni
2489b17634
[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175)
* uninstall old ray in all release test app configs

* add instruction to e2e.py dosctring
2021-12-18 16:58:49 -08:00
Yi Cheng
4e0de0053d
[nightly] Add staging nightly test for gcs ha (#21004)
This PR adds four staging nightly tests for gcs :
- many_actors
- many_tasks
- many_pgs
- many_nodes

These are benchmark tests that are highly related to gcs ha. 

To make it easier to add tests, this PR also change e2e.py a little bit to include testing flags to app config.
2021-12-09 23:07:23 -08:00
Kai Fricke
b3a9d4d87d
[ci/release] Remove quotation marks from pip installs (#20638)
Quotation marks were needed in Anyscale app configs to avoid install errors when # were used e.g. in URLs.
Since this has been fixed on the Anyscale side, we can get rid of these.
2021-12-05 17:57:08 -08:00
Kai Fricke
6b683ec8dc
[ci] Retry release tests on infra error (#20478)
This PR introduces proper exit codes for release tests. These are used to restart a certain set of infrastructure related failures automatically.
2021-12-02 10:34:40 -08:00
Simon Mo
d7f208dea4
[Releaes] Make e2e.py link clickable on buildkite (#20436)
Adds log formatting to output clickable links to buildkite console logs
2021-11-18 12:45:59 +00:00
Kai Fricke
693063d6f8
[ci/release] fix exit code (use value, not object) (#20427) 2021-11-16 15:15:39 +00:00
Kai Fricke
d191ad2de8
[ci/release] Return exit codes based on different errors (#20289) 2021-11-15 19:41:00 +00:00
Jiajun Yao
992ab3e098
[Release] Commit sanity check when a url is provided (#20255) 2021-11-11 13:33:58 -08:00
SangBin Cho
f3e3c04469
[Nightly test] Make report False by default. (#20238)
* Make report False by default.

* fix
2021-11-11 04:58:23 -08:00
Jiajun Yao
e110d958a1
Support different s3 url formats (#20133) 2021-11-07 14:58:51 -08:00
Amog Kamsetty
3408b60d2b
[Release] Refactor User Tests (#20028)
* wip

* add directory

* wip

* try again

* Revert "try again"

This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d.

* finish

* formatting

* fix merge

* fix path

* chmod

* check

* sudo

* wip

* update

* fix horovod

* try

* typo

* reduce num workers
2021-11-05 17:28:37 -07:00
Kai Fricke
a13f738a10
[ci/release] Fix cloud search query (#19876) 2021-10-29 11:30:34 +02:00
Kai Fricke
564d8551ed
[ci/release] only check alert if test succeeded before (#19857) 2021-10-28 16:09:10 -07:00
Simon Mo
3e038aebb2
[CI] Allow release tests infra to accept buildkite artifacts (#19803) 2021-10-27 13:04:01 -07:00
Kai Fricke
98244ad130
[ci/release] Report error to database on alert (#19743) 2021-10-26 10:48:02 +01:00
Kai Fricke
96ddf5b9ac
[ci/release] Choose cloud by name or ID (#19742) 2021-10-26 10:21:54 +01:00
Kai Fricke
71564040ec
[ci/release] Unwrap after installing pip packages (#19552) 2021-10-20 13:41:16 +01:00
Kai Fricke
3e8587644b
[ci/release] wrap all release test pip github installs in quotation marks (#19521) 2021-10-19 20:55:02 +01:00
Kai Fricke
eee05505b1
[ci/release] Add separate timeout parameter for prepare commands (#19459) 2021-10-18 16:29:25 +01:00
Kai Fricke
c10d434713
[release] Allow commit hashes instead of URLs, add bisection utility (#19398) 2021-10-18 10:44:29 +01:00
Kai Fricke
e17b23fa5b
[ci/release] Add support for RAY_WHEELS url (#19364) 2021-10-14 21:40:01 +01:00
Carlo Grisetti
5cee8a1985
[release tests] Switch from yaml.load to yaml.safe_load (#19365) 2021-10-13 17:27:25 -07:00
Kai Fricke
42116badba
[ci/release] Check test result alerts after test finished (#19105) 2021-10-05 21:27:27 +01:00
Jiajun Yao
b8ef4f0a34
[CI] Add a retry helper to e2e.py (#19045) 2021-10-02 09:54:41 -07:00
SangBin Cho
55227a15b9
Handle retry to avoid statement timeout exception/ (#18968) 2021-09-29 23:04:35 -07:00