Commit graph

43 commits

Author SHA1 Message Date
Simon Mo
438b6c78c8
[Release Tests] Add memory monitoring for Serve release test (#25868) 2022-06-17 11:11:56 -07:00
Jiao
f6735f90c7
[Ray DAG] Move dag project folder out of experimental (#25532) 2022-06-16 19:15:39 -07:00
Sihan Wang
b024a9543e
[Serve] Support scale replica down to 0 (#24892) 2022-06-02 08:06:46 -07:00
SangBin Cho
ec653e3196
[Nightly test] Move two line downloads to one line. (#25061)
It fixes the mysterious error when all cluster env build is failing when pip uninstall / pip install is written in 2 lines. The root cause will be fixed later
2022-05-22 00:07:03 -07:00
Kai Fricke
6c5229295e
[ci/release] Support running tests with different python versions (#24843)
OSS release tests currently run with hardcoded Python 3.7 base. In the future we will want to run tests on different python versions. 
This PR adds support for a new `python` field in the test configuration. The python field will determine both the base image used in the Buildkite runner docker container (for Ray client compatibility) and the base image for the Anyscale cluster environments. 

Note that in Buildkite, we will still only wait for the python 3.7 base image before kicking off tests. That is acceptable, as we can assume that most wheels finish in a similar time, so even if we wait for the 3.7 image and kick off a 3.8 test, that runner will wait maybe for 5-10 more minutes.
2022-05-17 17:03:12 +01:00
Sihan Wang
3f5da8af7a
[Serve] Add serve handle graph workload nightly tests (#24435) 2022-05-04 09:07:50 -07:00
Jiao
9d31f5f7b2
[Serve] Change deployment graph long chain test (#24418) 2022-05-03 10:38:47 -07:00
Jiao
ba7cc1803a
[Deployment Graph] Add release test for long chain & wide fanout pattern (#24246) 2022-04-29 17:03:33 -07:00
shrekris-anyscale
b51d0aa8b1
[serve] Introduce context.py and client.py (#24067)
Serve stores context state, including the `_INTERNAL_REPLICA_CONTEXT` and the `_global_client` in `api.py`. However, these data structures are referenced throughout the codebase, causing circular dependencies. This change introduces two new files:

* `context.py`
    * Intended to expose process-wide state to internal Serve code as well as `api.py`
    * Stores the `_INTERNAL_REPLICA_CONTEXT` and the `_global_client` global variables
* `client.py`
    * Stores the definition for the Serve `Client` object, now called the `ServeControllerClient`
2022-04-21 18:35:09 -05:00
Edward Oakes
de227ac407
[serve] Add component logger + basic access logging (#23558)
Adds a "component logger" to standardize logging across the HTTP proxy, controller, and deployment replicas.
2022-04-12 18:16:58 -05:00
Archit Kulkarni
db2c37c760
[serve] [release] Disable smoke test by default (#23334) 2022-03-18 18:40:48 -05:00
Kai Fricke
8608b64885
[ci/release] Remove old OSS release test infrastructure (#23134)
Now that we've migrated all OSS release tests to the new infrastructure, we can remove old config files and infra scripts.
2022-03-14 15:10:52 +00:00
Edward Oakes
135cd121b9
[release tests] Fix minor bug in multi-deployment serve test (#22961) 2022-03-09 14:37:27 -06:00
Edward Oakes
aa907987bf
[serve][release tests] Use m5.8xlarge instance types for 1k replica tests (#22918) 2022-03-08 21:34:01 -06:00
Archit Kulkarni
31332f8930
[serve] [release tests] Add health check grace period for 1k deployment (#22651) 2022-02-25 12:13:44 -06:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
shrekris-anyscale
e4370720cc
[Serve] Add "Serve" team tag to untagged release tests (#21861) 2022-01-25 11:46:03 -08:00
shrekris-anyscale
75b3080834
[Serve] Serve Autoscaling Release tests (#21208) 2022-01-21 12:08:25 -08:00
SangBin Cho
b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00
Kai Fricke
0e9e8824e4
[ci/release] use s3 sync (#21626)
Previous changes failed because a) permission errors b) unzip being unavailable at remote nodes. Instead we are using tar gzip archives now.

This reverts commit 42bcab27e8.
2022-01-15 17:53:19 -08:00
Kai Fricke
42bcab27e8
Revert "[Release Test] Opt-in tests to use K8s based cloud. (#21583)" (#21605)
This reverts commit 0d5fbcc7bb.
2022-01-14 11:46:52 -08:00
Simon Mo
0d5fbcc7bb
[Release Test] Opt-in tests to use K8s based cloud. (#21583) 2022-01-13 17:20:36 -08:00
mwtian
0b3fed5ef3
Revert "[Nightly Test] Add a team column to each test config. (#21198)" (#21289)
This reverts commit b5b11b2d06.
2021-12-30 06:44:51 +09:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
Simon Mo
c85e9e69b3
[Serve] Change multi_deployment_1k_noop_replica threshold (#20514) 2021-11-17 17:25:54 -08:00
Simon Mo
b6bd4fd5f3
[Serve] Don't recover from current state checkpoint (#19998) 2021-11-12 09:02:27 -08:00
Tobias Kaymak
893f57591d
[serve] Add Google Cloud Storage as a backend (#20104) 2021-11-10 19:45:19 -08:00
Amog Kamsetty
18dcf1ac25
[Release] Use nightly Docker images (#20001)
* use nightly

* switch ml cpu to ray cpu

* fix

* add pytest

* add more pytest

* add constraint

* add tensorflow

* fix merge conflict

* add tblib

* fix

* add back uninstall
2021-11-10 18:00:16 -08:00
Simon Mo
4d583da7d5
[Serve] Add verbose log for nightly test only (#20088) 2021-11-04 16:15:22 -07:00
Jiao
3f628d4f6b
increase long poll timeout and wrk trial cpu resource (#19768) 2021-10-26 21:31:39 -07:00
Jiao
85b8a6de5f
[Serve] Add nightly test for Serve failure recovery (#19125) 2021-10-11 18:33:20 -07:00
Jiao
ca3be60291
[Releaes] change headnode type for serve benchmark (#18672)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-09-16 10:57:36 -07:00
Jiao
d3734d803d
[serve] Change nightly test docker image and enable micro benchmark (#18566) 2021-09-14 09:41:21 -05:00
Jiajun Yao
ec6f5ae9ab
Upgrade serve_tests and runtime_env_tests base image to 1.6.0 (#18563) 2021-09-13 12:47:06 -07:00
Kai Fricke
7d1e6d3129
[ci/release] Add sanity check for ray wheels hash to release tests (#18489) 2021-09-10 17:50:31 +01:00
Jiao
b52c873027
[serve] Use list_deployments in benchmark (#18050) 2021-08-25 12:26:46 -05:00
Kai Fricke
8580e450cb
[release] update/unify base images (#17859) 2021-08-16 12:44:25 +02:00
Jiao
3c64a1a3c1
Add micro benchmark to releaser repo (#17727) 2021-08-11 15:15:33 -07:00
Jiao
2618236167
[serve] Fix single deployment nightly test (#17368) 2021-07-28 11:38:06 -05:00
Jiao
9eb1bcd061
[serve] Multi & single deployment large scale test (#17310) 2021-07-27 10:46:45 -05:00
Edward Oakes
58423e6018
[serve] Improve nightly release test (#17277) 2021-07-26 11:15:46 -05:00
Jiao
7473f663ef
[Release] change replica to 100 to collect signals now (#17214)
Co-authored-by: Jiao Dong <jiaodong@anyscale.com>
2021-07-20 12:27:56 -07:00
Jiao
994ff3ce21
[Serve] Add initial large scale tests (#17026) 2021-07-20 08:56:29 -07:00