Commit graph

38 commits

Author SHA1 Message Date
Yi Cheng
de76d86bcb
[nightly] Stop GCS HA related nightly test (#22636)
Since we've already turned it on on master, we should stop these tests for now.
2022-02-24 16:40:08 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
SangBin Cho
6b4aac7a08
Promote unstable tests to stable (#21811)
Promote tests that have passed 100% last 1 week to stable
2022-01-24 02:10:37 -08:00
Yi Cheng
90093769df
[nightly] Add more many tasks tests (#21727)
This PR add four tests for many tasks:

many short tasks send from the single node
many short tasks send from multiple nodes
many long tasks send from multiple nodes
many long tasks send from the single node
TODO: migrate many nodes actor tests to this one.

scheduling envelop should contain:

(tasks): scheduling_test_many_xx_tasks_yy_nodes
(actors):many_nodes_actor_test (to be combined with this one)
(shuffle): pipelined_ingestion_1500_gb_15_windows
(shuffle): dask_on_ray_1tb_sort
2022-01-20 14:52:26 -08:00
SangBin Cho
b1308b1c8c
[Test Infra] Unrevert team col (#21700)
This fixes the previous problems from team column revert.

This has 2 additional changes;

alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289

Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time
2022-01-19 13:29:53 -08:00
Yi Cheng
a6e76c2803
[nightly] Disable bootstrapping from gcs (#21570)
Right now, testing infra doesn't support run ray without redis. Disable it shortly so that we can still test the rest functionality.
2022-01-12 23:02:42 -08:00
Yi Cheng
72c9fef5f3
[nightly] Enable GCS HA nightly test with bootstrap (#21389)
After https://github.com/ray-project/ray/pull/21232 we are able to start ray without redis. We need to bake the test for a while before turning on the flag by default.
This PR add tests for this.
2022-01-05 10:53:07 -08:00
mwtian
0b3fed5ef3
Revert "[Nightly Test] Add a team column to each test config. (#21198)" (#21289)
This reverts commit b5b11b2d06.
2021-12-30 06:44:51 +09:00
SangBin Cho
b5b11b2d06
[Nightly Test] Add a team column to each test config. (#21198)
Please review **e2e.py and test_suite belonging to your team**! 

This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit#

This PR adds a team name to each test suite.

If the name is not specified, it will be reported as unspecified. 

If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future).

Note that we will aggregate all of test config into a single file, nightly_test.yaml.
2021-12-27 14:42:41 -08:00
SangBin Cho
44320aba3b
[Nightly Test] Fix broken scalability test #21201
I added memory monitor to the scalability tests. This broke the tests because creating a memory monitor requires the node resources (to be scheduled on a head node), and that broke "resource leak" check. Ideally, this resource leak check should be more robust, but I fix the issue in an easier way for now. In the sooner future, memory monitor will become a fixture, and in that case, we should fix resource leak function code.
2021-12-20 14:58:39 -08:00
Yi Cheng
abdf9b5f3c
[nightly] Fix benchmark commit check failure (#21119)
It looks like somehow `pip3 install -U` won't update ray anymore, and we need to uninstall before installing.
2021-12-15 14:54:03 -08:00
SangBin Cho
1c1430ff5c
Add memory monitor to scalability tests. (#21102)
This adds memory monitoring to scalability envelope tests so that we can compare the peak memory usage for both nonHA & HA.

NOTE: the current way of adding memory monitor is not great, and we should implement fixture to support this better, but that's not in progress yet.
2021-12-15 01:31:38 -08:00
Kai Fricke
b58f839534
[ci/release] Remove hard numpy removal from app configs (#21005) 2021-12-13 15:22:02 +00:00
Yi Cheng
4e0de0053d
[nightly] Add staging nightly test for gcs ha (#21004)
This PR adds four staging nightly tests for gcs :
- many_actors
- many_tasks
- many_pgs
- many_nodes

These are benchmark tests that are highly related to gcs ha. 

To make it easier to add tests, this PR also change e2e.py a little bit to include testing flags to app config.
2021-12-09 23:07:23 -08:00
SangBin Cho
2e1482c38a
[Nightly Test] Fix a wrong prepare script for object store nightly test (#20739)
By mistake, we are running sleep 0 instead of wait_cluster.py
2021-11-28 20:40:59 -08:00
SangBin Cho
97b4490401
[Nightly Test] Readjust nightly test schedule (#20717)
- Removing scale_to logic from object store. We don't need to scale during tests, which will disambiguate infra failures vs app failures.
- Run microbenchmark in core nightly, meaning it will run even more often
- Run weekly scalability tests daily instead. (They are not too expensive).
- Run some core daily tests separately to avoid infra failures.
2021-11-26 06:59:16 -08:00
Yi Cheng
b6b4d4cf57
[test] Update base image for nightly testing (#20680)
## Why are these changes needed?

`base_image: "anyscale/ray-ml:pinned-nightly-py37"` doesn't exist anymore which fails a lot of nightly tests, change to `base_image: "anyscale/ray-ml:nightly-py37-gpu"`
## Related issue number

## Checks
2021-11-23 11:06:44 -08:00
Jiajun Yao
3cb2b3e23a
Fix test_single_node json report (#19075) 2021-10-04 13:05:32 -07:00
Jiajun Yao
be29d27e8a
[Scalability Envelope] Include broadcast time in test_object_store result json (#18974) 2021-09-29 13:49:16 -07:00
Kai Fricke
7d1e6d3129
[ci/release] Add sanity check for ray wheels hash to release tests (#18489) 2021-09-10 17:50:31 +01:00
Alex Wu
ca86098680
Revert "[core] Refactor test_many_tasks (#18169)" (#18216)
This reverts commit eb6fd20d53.
2021-08-30 10:35:23 -07:00
Stephanie Wang
eb6fd20d53
[core] Refactor test_many_tasks (#18169)
* Improve test

test

* lint
2021-08-30 10:33:23 -07:00
Kai Fricke
089dd9b949
[release] Add release logs for 1.6.0 (#18067) 2021-08-26 12:13:15 +02:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. (#17885)
* gcs_utils

* resource_spec

* profiling

* ray_perf and ray_cluster_perf

* test_utils
2021-08-18 20:56:33 -07:00
Alex Wu
af880378da
Lower threshold on scalability envelope many tasks (#17511) 2021-08-02 11:50:08 -07:00
Alex Wu
9e79301d35
Split scalability envelope + smoke tests (#17455)
* .

* done?

* done?

* sang comments

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-30 10:20:19 -07:00
Chen Shen
02f58a5c6b
[nightly-test] increase timeout to 1 hour (#17125) 2021-07-15 12:30:08 -07:00
SangBin Cho
63ebfe2f2d
Revert back to ray.init (#17047) 2021-07-13 14:36:27 -07:00
Alex Wu
b08795582b
Disable runtime envs in scalability envelope (#16978)
Co-authored-by: Alex Wu <alex@anyscale.com>
2021-07-11 09:53:15 -07:00
Alex Wu
ba9fd06f87
Integrate scalability envelope with releaser (#16417)
* .

* .

* .

* .

* .

* .

* .

* success

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-06-15 10:42:55 -07:00
Clark Zinzow
ca68bf1e93
[Release] Update release test configs for 1.4 release. (#16292)
* Updated scalability envelope tests for 1.4.

* Update data processing release test for 1.4.
2021-06-08 00:15:25 -07:00
Kai Fricke
1d52ab819f
[release] release 1.3.0 results and test updates (#15366)
Convert a number of release tests and add logs for release 1.3.0
2021-05-04 22:10:04 +01:00
Alex Wu
805b8a10a3
Move scalability envelope back down to 250 nodes (#15381)
* .

* done?

* .

Co-authored-by: Alex Wu <alex@anyscale.com>
2021-04-16 19:39:24 -07:00
Dmitri Gekhtman
e6864523cf
[autoscaler] Do not divide by zero in resource demand scheduler (#15323)
* Do not divide by zero

* Don't take min or mean of an empty list

* max workers 0 for head node in distributed benchmark

* test

* Correct the type annotation

* comment grammar tweak

* message

* docs

* test

* Move test cli to large tests.
2021-04-16 10:20:05 -07:00
Alex Wu
62214f1b80
Delete WIP in scalability envelope (#14791) 2021-03-18 17:53:53 -07:00
SangBin Cho
b1e0409447
[Test] Improve scalability envelope (#14406)
* fixed.

* fix.

* Update the result.

* Addressed code review.
2021-03-01 18:36:52 -08:00
Alex Wu
a13208f113
Scalability envelope readme typo (#13874) 2021-02-03 21:43:45 -08:00
Alex Wu
840987c7af
Scalability Envelope Tests (#13464) 2021-01-25 18:48:31 -08:00