hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
SangBin Cho	b350fe9ee8	[Nightly test] Fix additional k8s issues + add new tests (#23231 ) Fix bug from the previous fixes. Add more tests Stop using m5.xlarge (not supported now) There are 2 hard blockers from the infra: 1. Large size disk is not supported. 2. m5.xlarge is not supported. Both are considered as a high priority to be fixed soon.	2022-03-16 16:37:29 -07:00
Kai Fricke	8608b64885	[ci/release] Remove old OSS release test infrastructure (#23134 ) Now that we've migrated all OSS release tests to the new infrastructure, we can remove old config files and infra scripts.	2022-03-14 15:10:52 +00:00
SangBin Cho	2c2d96eeb1	[Nightly tests] Improve k8s testing (#23108 ) This PR improves broken k8s tests. Use exponential backoff on the unstable HTTP path (getting job status sometimes has broken connection from the server. I couldn't really find the relevant logs to figure out why this is happening, unfortunately). Fix benchmark tests resource leak check. The existing one was broken because the job submission uses 0.001 node IP resource, which means the cluster_resources can never be the same as available resources. I fixed the issue by not checking node IP resources K8s infra doesn't support instances < 8 CPUs. I used m5.2xlarge instead of xlarge. It will increase the cost a bit, but it wouldn't be very big.	2022-03-14 03:49:15 -07:00
Yi Cheng	de76d86bcb	[nightly] Stop GCS HA related nightly test (#22636 ) Since we've already turned it on on master, we should stop these tests for now.	2022-02-24 16:40:08 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
SangBin Cho	6b4aac7a08	Promote unstable tests to stable (#21811 ) Promote tests that have passed 100% last 1 week to stable	2022-01-24 02:10:37 -08:00
Yi Cheng	90093769df	[nightly] Add more many tasks tests (#21727 ) This PR add four tests for many tasks: many short tasks send from the single node many short tasks send from multiple nodes many long tasks send from multiple nodes many long tasks send from the single node TODO: migrate many nodes actor tests to this one. scheduling envelop should contain: (tasks): scheduling_test_many_xx_tasks_yy_nodes (actors):many_nodes_actor_test (to be combined with this one) (shuffle): pipelined_ingestion_1500_gb_15_windows (shuffle): dask_on_ray_1tb_sort	2022-01-20 14:52:26 -08:00
SangBin Cho	b1308b1c8c	[Test Infra] Unrevert team col (#21700 ) This fixes the previous problems from team column revert. This has 2 additional changes; alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289 Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time	2022-01-19 13:29:53 -08:00
Yi Cheng	a6e76c2803	[nightly] Disable bootstrapping from gcs (#21570 ) Right now, testing infra doesn't support run ray without redis. Disable it shortly so that we can still test the rest functionality.	2022-01-12 23:02:42 -08:00
Yi Cheng	72c9fef5f3	[nightly] Enable GCS HA nightly test with bootstrap (#21389 ) After https://github.com/ray-project/ray/pull/21232 we are able to start ray without redis. We need to bake the test for a while before turning on the flag by default. This PR add tests for this.	2022-01-05 10:53:07 -08:00
mwtian	0b3fed5ef3	Revert "[Nightly Test] Add a team column to each test config. (#21198 )" (#21289 ) This reverts commit `b5b11b2d06`.	2021-12-30 06:44:51 +09:00
SangBin Cho	b5b11b2d06	[Nightly Test] Add a team column to each test config. (#21198 ) Please review e2e.py and test_suite belonging to your team! This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit# This PR adds a team name to each test suite. If the name is not specified, it will be reported as unspecified. If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future). Note that we will aggregate all of test config into a single file, nightly_test.yaml.	2021-12-27 14:42:41 -08:00
SangBin Cho	44320aba3b	[Nightly Test] Fix broken scalability test #21201 I added memory monitor to the scalability tests. This broke the tests because creating a memory monitor requires the node resources (to be scheduled on a head node), and that broke "resource leak" check. Ideally, this resource leak check should be more robust, but I fix the issue in an easier way for now. In the sooner future, memory monitor will become a fixture, and in that case, we should fix resource leak function code.	2021-12-20 14:58:39 -08:00
Yi Cheng	abdf9b5f3c	[nightly] Fix benchmark commit check failure (#21119 ) It looks like somehow `pip3 install -U` won't update ray anymore, and we need to uninstall before installing.	2021-12-15 14:54:03 -08:00
SangBin Cho	1c1430ff5c	Add memory monitor to scalability tests. (#21102 ) This adds memory monitoring to scalability envelope tests so that we can compare the peak memory usage for both nonHA & HA. NOTE: the current way of adding memory monitor is not great, and we should implement fixture to support this better, but that's not in progress yet.	2021-12-15 01:31:38 -08:00
Kai Fricke	b58f839534	[ci/release] Remove hard numpy removal from app configs (#21005 )	2021-12-13 15:22:02 +00:00
Yi Cheng	4e0de0053d	[nightly] Add staging nightly test for gcs ha (#21004 ) This PR adds four staging nightly tests for gcs : - many_actors - many_tasks - many_pgs - many_nodes These are benchmark tests that are highly related to gcs ha. To make it easier to add tests, this PR also change e2e.py a little bit to include testing flags to app config.	2021-12-09 23:07:23 -08:00
SangBin Cho	2e1482c38a	[Nightly Test] Fix a wrong prepare script for object store nightly test (#20739 ) By mistake, we are running sleep 0 instead of wait_cluster.py	2021-11-28 20:40:59 -08:00
SangBin Cho	97b4490401	[Nightly Test] Readjust nightly test schedule (#20717 ) - Removing scale_to logic from object store. We don't need to scale during tests, which will disambiguate infra failures vs app failures. - Run microbenchmark in core nightly, meaning it will run even more often - Run weekly scalability tests daily instead. (They are not too expensive). - Run some core daily tests separately to avoid infra failures.	2021-11-26 06:59:16 -08:00
Yi Cheng	b6b4d4cf57	[test] Update base image for nightly testing (#20680 ) ## Why are these changes needed? `base_image: "anyscale/ray-ml:pinned-nightly-py37"` doesn't exist anymore which fails a lot of nightly tests, change to `base_image: "anyscale/ray-ml:nightly-py37-gpu"` ## Related issue number ## Checks	2021-11-23 11:06:44 -08:00
Jiajun Yao	3cb2b3e23a	Fix test_single_node json report (#19075 )	2021-10-04 13:05:32 -07:00
Jiajun Yao	be29d27e8a	[Scalability Envelope] Include broadcast time in test_object_store result json (#18974 )	2021-09-29 13:49:16 -07:00
Kai Fricke	7d1e6d3129	[ci/release] Add sanity check for ray wheels hash to release tests (#18489 )	2021-09-10 17:50:31 +01:00
Alex Wu	ca86098680	Revert "[core] Refactor test_many_tasks (#18169 )" (#18216 ) This reverts commit `eb6fd20d53`.	2021-08-30 10:35:23 -07:00
Stephanie Wang	eb6fd20d53	[core] Refactor test_many_tasks (#18169 ) * Improve test test * lint	2021-08-30 10:33:23 -07:00
Kai Fricke	089dd9b949	[release] Add release logs for 1.6.0 (#18067 )	2021-08-26 12:13:15 +02:00
Clark Zinzow	d958457d07	[Core] Second pass at privatizing APIs. (#17885 ) * gcs_utils * resource_spec * profiling * ray_perf and ray_cluster_perf * test_utils	2021-08-18 20:56:33 -07:00
Alex Wu	af880378da	Lower threshold on scalability envelope many tasks (#17511 )	2021-08-02 11:50:08 -07:00
Alex Wu	9e79301d35	Split scalability envelope + smoke tests (#17455 ) * . * done? * done? * sang comments * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-07-30 10:20:19 -07:00
Chen Shen	02f58a5c6b	[nightly-test] increase timeout to 1 hour (#17125 )	2021-07-15 12:30:08 -07:00
SangBin Cho	63ebfe2f2d	Revert back to ray.init (#17047 )	2021-07-13 14:36:27 -07:00
Alex Wu	b08795582b	Disable runtime envs in scalability envelope (#16978 ) Co-authored-by: Alex Wu <alex@anyscale.com>	2021-07-11 09:53:15 -07:00
Alex Wu	ba9fd06f87	Integrate scalability envelope with releaser (#16417 ) * . * . * . * . * . * . * . * success Co-authored-by: Alex Wu <alex@anyscale.com>	2021-06-15 10:42:55 -07:00
Clark Zinzow	ca68bf1e93	[Release] Update release test configs for 1.4 release. (#16292 ) * Updated scalability envelope tests for 1.4. * Update data processing release test for 1.4.	2021-06-08 00:15:25 -07:00
Kai Fricke	1d52ab819f	[release] release 1.3.0 results and test updates (#15366 ) Convert a number of release tests and add logs for release 1.3.0	2021-05-04 22:10:04 +01:00
Alex Wu	805b8a10a3	Move scalability envelope back down to 250 nodes (#15381 ) * . * done? * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-04-16 19:39:24 -07:00
Dmitri Gekhtman	e6864523cf	[autoscaler] Do not divide by zero in resource demand scheduler (#15323 ) * Do not divide by zero * Don't take min or mean of an empty list * max workers 0 for head node in distributed benchmark * test * Correct the type annotation * comment grammar tweak * message * docs * test * Move test cli to large tests.	2021-04-16 10:20:05 -07:00
Alex Wu	62214f1b80	Delete WIP in scalability envelope (#14791 )	2021-03-18 17:53:53 -07:00
SangBin Cho	b1e0409447	[Test] Improve scalability envelope (#14406 ) * fixed. * fix. * Update the result. * Addressed code review.	2021-03-01 18:36:52 -08:00
Alex Wu	a13208f113	Scalability envelope readme typo (#13874 )	2021-02-03 21:43:45 -08:00
Alex Wu	840987c7af	Scalability Envelope Tests (#13464 )	2021-01-25 18:48:31 -08:00

41 commits