hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Jiajun Yao	56c7b74072	Delete nightly shuffle_data_loader (#22185 )	2022-02-07 15:23:34 -08:00
Jiajun Yao	355ee4a02c	Fix nightly shuffle_data_loader by pinning down dependencies versions (#22183 )	2022-02-07 11:25:30 -08:00
Chen Shen	13819304d4	[Core][nightly-test] better way of calculating num features (#22158 ) * better filter of column length * address comments * more	2022-02-07 02:13:40 -08:00
Balaji Veeramani	7f1bacc7dc	[CI] Format Python code with Black (#21975 ) See #21316 and #21311 for the motivation behind these changes.	2022-01-29 18:41:57 -08:00
Chen Shen	bfe3e5f4a8	add check on shape (#21947 )	2022-01-28 12:27:43 -08:00
Jiajun Yao	cea80b1a5b	Don't advertise cpus on gpu nodes for pipelined ingestion tests (#21899 ) * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests * Don't advertise cpus on gpu nodes for pipelined ingestion tests	2022-01-27 09:17:01 -08:00
SangBin Cho	ac5f38d7fd	[Test] Fix dask on ray test on K8s (#21816 ) Fix dash on ray large scale test on K8s. Basically, chmod requires a root access, which we don't have it by default in the k8s cluster. We don't need chmod I think (I verified the test passes without it).	2022-01-24 15:09:22 -08:00
SangBin Cho	6b4aac7a08	Promote unstable tests to stable (#21811 ) Promote tests that have passed 100% last 1 week to stable	2022-01-24 02:10:37 -08:00
SangBin Cho	babc03edf2	Add a threaded actor k8s test (#21739 ) Add threaded actor flaky test to k8s.	2022-01-23 20:12:57 -08:00
SangBin Cho	02af73a571	[Test] First core nightly test migration to k8s (#21698 ) The first migration of test into k8s. We are adopting a conservative approach (migrate slowly while we keep existing test suites). Once things are confirmed to be stable, we will migrate with more speed.	2022-01-19 13:31:49 -08:00
SangBin Cho	b1308b1c8c	[Test Infra] Unrevert team col (#21700 ) This fixes the previous problems from team column revert. This has 2 additional changes; alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289 Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time	2022-01-19 13:29:53 -08:00
Chen Shen	74d4e7c20c	install botocore with s3fs to ensure no confliction (#21680 )	2022-01-18 23:09:16 -08:00
Jiajun Yao	bb04cc9d80	Use latest cmake for pipelined_ingestion and pipelined_training tests (#21674 )	2022-01-18 12:03:43 -08:00
Jiajun Yao	76b91efd9b	Fix wrong many_nodes_actor_test app config (#21404 ) RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.	2022-01-05 11:52:13 -08:00
Chen Shen	704404d408	[BigDataTraining] Fix test script introduced by API change (#21347 ) * fix * fix test failure * Update release/nightly_tests/dataset/ray_sgd_training.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 12:14:36 -08:00
Antoni Baum	7ce22b72ed	[datasets] Expand `to_torch`'s functionality (#21117 ) Expands the `to_torch` method for Datasets with: * An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists) * An ability to choose whether the label should be unsqueezed or not * An ability to pass `None` as the label (for prediction). Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists). Unit tests for all cases are included. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 09:03:50 -08:00
Jiajun Yao	9776e21842	Revert "Round robin during spread scheduling (#19968 )" (#21293 ) This reverts commit `60388b2834`.	2021-12-30 10:33:06 +09:00
mwtian	0b3fed5ef3	Revert "[Nightly Test] Add a team column to each test config. (#21198 )" (#21289 ) This reverts commit `b5b11b2d06`.	2021-12-30 06:44:51 +09:00
SangBin Cho	b5b11b2d06	[Nightly Test] Add a team column to each test config. (#21198 ) Please review e2e.py and test_suite belonging to your team! This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit# This PR adds a team name to each test suite. If the name is not specified, it will be reported as unspecified. If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future). Note that we will aggregate all of test config into a single file, nightly_test.yaml.	2021-12-27 14:42:41 -08:00
Akash Patel	cbcd03b779	Upgrade cython to 0.29.26 for py310 (#21244 )	2021-12-26 20:26:08 -08:00
Jiajun Yao	60388b2834	Round robin during spread scheduling (#19968 )	2021-12-22 20:27:34 -08:00
Jiajun Yao	7d861a2c58	[Test] Add ray wheel sanity check (#21223 )	2021-12-21 14:24:02 -08:00
architkulkarni	2489b17634	[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175 ) * uninstall old ray in all release test app configs * add instruction to e2e.py dosctring	2021-12-18 16:58:49 -08:00
Chen Shen	c9c3f0745a	[Dataset][nighlytest] use latest ray for running test #21148 We are actually using the ray comes with the image, which is on a very old version of Ray. (suprised this actually works)	2021-12-17 23:48:44 -08:00
Chen Shen	80eb00f525	[Chaos] fix dataset chaos test #21113	2021-12-15 20:13:38 -08:00
Clark Zinzow	ec06a1f65e	[CUJ#2] Update nightly test for CUJ#2 #21064	2021-12-15 13:19:59 -08:00
SangBin Cho	1c1430ff5c	Add memory monitor to scalability tests. (#21102 ) This adds memory monitoring to scalability envelope tests so that we can compare the peak memory usage for both nonHA & HA. NOTE: the current way of adding memory monitor is not great, and we should implement fixture to support this better, but that's not in progress yet.	2021-12-15 01:31:38 -08:00
Chen Shen	3c426ed7b5	[nighly-test] fix dataset nightly test reporting #21061	2021-12-14 00:05:40 -08:00
Kai Fricke	b58f839534	[ci/release] Remove hard numpy removal from app configs (#21005 )	2021-12-13 15:22:02 +00:00
Chen Shen	d0e79a36f9	[chaos-test] chaos test pipeline ingestion (#20929 ) since it has been passing my test run; i'll land it and mark it as unstable.	2021-12-09 13:43:00 -08:00
Chen Shen	6a274dfd76	CI][Chaos-test] chaos test now can set max-nodes-to-kill #20962	2021-12-09 13:41:46 -08:00
Chen Shen	aca954e8dd	[dataset][cuj2] add another single node ingestion example (#20754 ) * add runner * fix bugs * add configs * add time	2021-12-07 02:50:17 -08:00
Chen Shen	a628182cf5	[nighly-test] update cuj2 to reflect latest change #20889 we fixed groupby issue in cuj2; sync the change into nightly test. this test doesn't need to use gpu at all. it returns soon after data ingestion finishes.	2021-12-06 09:59:21 -08:00
Chen Shen	6d17fe5fc5	[cuj2] merge latest change to cuj2 (groupby based filtering) and add a debug mode. (#20742 ) This PR does two things: merge latest groupby based filtering to CUJ2 add a debug mode so we only run dummy trainer for measure data processing performance.	2021-11-29 19:10:17 -08:00
SangBin Cho	6fc6ebb43e	Promote some tests stable. (#20740 ) Mark staging tests that pass 10+ time in a row as stable tests	2021-11-28 18:43:39 -08:00
SangBin Cho	cd7a32f1a5	[Nightly test] Chaos test fixture (#20277 ) This PR is mostly for implementing "fixture" for nightly test. Note that the current fixture implementation is not that great, and we can probably improve this in the future after refactoring e2e.py.	2021-11-24 17:13:29 -08:00
Alex Wu	63969c9a5c	[nigthly-tests][dataset] Use actor compute model for GPU inference (#20689 ) ## Why are these changes needed? Fix nightly tests to avoid oom ## Checks	2021-11-24 11:03:23 -08:00
SangBin Cho	ca092fd032	[Nightly test] Fix broken pg long running test master (#20674 ) * Fixed. * Fix trial	2021-11-23 21:24:00 -08:00
Chen Shen	107aef89a8	[CUJ2] add nightly tests for running 500GB ray train (#20195 ) * add * update cluster env * fix build Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>	2021-11-21 20:04:45 -08:00
Alex Wu	24f27203ba	[hotfix] Fix inference nightly test by upgrading numpy (#20546 ) The ray-ml image depends on numpy ~=1.19.2 via the tensorflow==2.6 requirement. Unfortunately that's incompatible with Dataset (see here #20258 (comment)). This PR upgrades the numpy dependency only for the nightly test.	2021-11-19 08:15:23 -08:00
Amog Kamsetty	9796ae56d5	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
SangBin Cho	5ec63ccc5f	[Regresion test] Placement group long running test (#20251 ) Why are these changes needed? In the past, there was a regression the placement group creation time gets slower as time goes. I believe the issue is fixed in the master, but this PR verifies if that's actually fixed. This PR adds a long running test for the placement group. There are 2 purposes of the test. Make sure the placement group creation / removal doesn't get slower as time goes. The test basically measure the first 20 iteration P50 creation time and run very long iteration. After all iteration, it checks if the p50 creation time is not too slow compared to the initial round. Make sure placement group removal / creation works consistently for a long time without an issue. Q: Should we make it a real long running test? (that runs for a day?)	2021-11-16 04:21:18 -08:00
SangBin Cho	a4f72c6606	[nightly] Fix pg stress test (#20362 ) ## Why are these changes needed? This was mistakenly added to the nightly. Fixing it. ## Related issue number	2021-11-15 00:17:18 -08:00
SangBin Cho	6cc493079b	[Core] Add Placement group performance test (#20218 ) * in progress * ip * Fix issues * done * Address code review.	2021-11-14 09:17:54 +09:00
SangBin Cho	9fd8c6648c	[Test] Fix newly added nightly tests, threaded actor + chaos testing (#20220 ) * Fix nightly tests * done * done	2021-11-11 05:01:19 -08:00
Amog Kamsetty	18dcf1ac25	[Release] Use nightly Docker images (#20001 ) * use nightly * switch ml cpu to ray cpu * fix * add pytest * add more pytest * add constraint * add tensorflow * fix merge conflict * add tblib * fix * add back uninstall	2021-11-10 18:00:16 -08:00
SangBin Cho	90fd38c64a	[Test] Large scale threaded actor workload (#20105 ) * Done * Addressed code review. * lint * Update release/nightly_tests/stress_tests/test_threaded_actors.py Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>	2021-11-09 02:28:48 -08:00
SangBin Cho	5c4fb4dc91	[Core]Chaos testing nightly (#20059 ) * Done initial stage. * lint * . * Finished. * Fix lint	2021-11-08 21:57:53 -08:00
Yi Cheng	6a6cc434ba	[nightly] Remove grpc staging test since nightly is stable #20119 (#20119 )	2021-11-05 21:36:58 -07:00
Yi Cheng	04f60c998e	[nightly] Fix pytest missing in nightly test (#20076 ) ## Why are these changes needed? In the nightly test we see ``` Command returned non-success status: 1; Command logs:Traceback (most recent call last): File "dask_on_ray/large_scale_test.py", line 17, in from ray._private.test_utils import monitor_memory_usage File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/test_utils.py", line 18, in import pytest ModuleNotFoundError: No module named 'pytest' ``` This PR fixes this error. ## Related issue number	2021-11-04 13:38:05 -07:00

1 2 3

113 commits