hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
Chen Shen	a628182cf5	[nighly-test] update cuj2 to reflect latest change #20889 we fixed groupby issue in cuj2; sync the change into nightly test. this test doesn't need to use gpu at all. it returns soon after data ingestion finishes.	2021-12-06 09:59:21 -08:00
Chen Shen	6d17fe5fc5	[cuj2] merge latest change to cuj2 (groupby based filtering) and add a debug mode. (#20742 ) This PR does two things: merge latest groupby based filtering to CUJ2 add a debug mode so we only run dummy trainer for measure data processing performance.	2021-11-29 19:10:17 -08:00
SangBin Cho	6fc6ebb43e	Promote some tests stable. (#20740 ) Mark staging tests that pass 10+ time in a row as stable tests	2021-11-28 18:43:39 -08:00
SangBin Cho	cd7a32f1a5	[Nightly test] Chaos test fixture (#20277 ) This PR is mostly for implementing "fixture" for nightly test. Note that the current fixture implementation is not that great, and we can probably improve this in the future after refactoring e2e.py.	2021-11-24 17:13:29 -08:00
Alex Wu	63969c9a5c	[nigthly-tests][dataset] Use actor compute model for GPU inference (#20689 ) ## Why are these changes needed? Fix nightly tests to avoid oom ## Checks	2021-11-24 11:03:23 -08:00
SangBin Cho	ca092fd032	[Nightly test] Fix broken pg long running test master (#20674 ) * Fixed. * Fix trial	2021-11-23 21:24:00 -08:00
Chen Shen	107aef89a8	[CUJ2] add nightly tests for running 500GB ray train (#20195 ) * add * update cluster env * fix build Co-authored-by: Matthew Deng <matthew.j.deng@gmail.com>	2021-11-21 20:04:45 -08:00
Alex Wu	24f27203ba	[hotfix] Fix inference nightly test by upgrading numpy (#20546 ) The ray-ml image depends on numpy ~=1.19.2 via the tensorflow==2.6 requirement. Unfortunately that's incompatible with Dataset (see here #20258 (comment)). This PR upgrades the numpy dependency only for the nightly test.	2021-11-19 08:15:23 -08:00
Amog Kamsetty	9796ae56d5	[Train][Data] Change usages of `iter_datasets` to `iter_epochs` (#20487 )	2021-11-17 18:05:51 -08:00
SangBin Cho	5ec63ccc5f	[Regresion test] Placement group long running test (#20251 ) Why are these changes needed? In the past, there was a regression the placement group creation time gets slower as time goes. I believe the issue is fixed in the master, but this PR verifies if that's actually fixed. This PR adds a long running test for the placement group. There are 2 purposes of the test. Make sure the placement group creation / removal doesn't get slower as time goes. The test basically measure the first 20 iteration P50 creation time and run very long iteration. After all iteration, it checks if the p50 creation time is not too slow compared to the initial round. Make sure placement group removal / creation works consistently for a long time without an issue. Q: Should we make it a real long running test? (that runs for a day?)	2021-11-16 04:21:18 -08:00
SangBin Cho	a4f72c6606	[nightly] Fix pg stress test (#20362 ) ## Why are these changes needed? This was mistakenly added to the nightly. Fixing it. ## Related issue number	2021-11-15 00:17:18 -08:00
SangBin Cho	6cc493079b	[Core] Add Placement group performance test (#20218 ) * in progress * ip * Fix issues * done * Address code review.	2021-11-14 09:17:54 +09:00
SangBin Cho	9fd8c6648c	[Test] Fix newly added nightly tests, threaded actor + chaos testing (#20220 ) * Fix nightly tests * done * done	2021-11-11 05:01:19 -08:00
Amog Kamsetty	18dcf1ac25	[Release] Use nightly Docker images (#20001 ) * use nightly * switch ml cpu to ray cpu * fix * add pytest * add more pytest * add constraint * add tensorflow * fix merge conflict * add tblib * fix * add back uninstall	2021-11-10 18:00:16 -08:00
SangBin Cho	90fd38c64a	[Test] Large scale threaded actor workload (#20105 ) * Done * Addressed code review. * lint * Update release/nightly_tests/stress_tests/test_threaded_actors.py Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>	2021-11-09 02:28:48 -08:00
SangBin Cho	5c4fb4dc91	[Core]Chaos testing nightly (#20059 ) * Done initial stage. * lint * . * Finished. * Fix lint	2021-11-08 21:57:53 -08:00
Yi Cheng	6a6cc434ba	[nightly] Remove grpc staging test since nightly is stable #20119 (#20119 )	2021-11-05 21:36:58 -07:00
Yi Cheng	04f60c998e	[nightly] Fix pytest missing in nightly test (#20076 ) ## Why are these changes needed? In the nightly test we see ``` Command returned non-success status: 1; Command logs:Traceback (most recent call last): File "dask_on_ray/large_scale_test.py", line 17, in from ray._private.test_utils import monitor_memory_usage File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/_private/test_utils.py", line 18, in import pytest ModuleNotFoundError: No module named 'pytest' ``` This PR fixes this error. ## Related issue number	2021-11-04 13:38:05 -07:00
Lixin Wei	1fe9f3372e	[Nightly Test] Remove duplicate printing code (#19874 ) ## Why are these changes needed? Remove duplicate printing code	2021-10-29 10:19:19 -07:00
Yi Cheng	abec07700a	[nightly] Adding more tests related to grpc broadcasting to staging mode (#19779 ) ## Why are these changes needed? We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging. ## Related issue number #19438	2021-10-27 10:46:13 -07:00
SangBin Cho	ecd5a622ef	[Tests] Add a memory usage on dask on ray tests (#19674 )	2021-10-25 14:58:26 -07:00
Yi Cheng	7a7b356899	[Nightly test] add test for grpc broadcasting (#19579 )	2021-10-21 07:01:41 -07:00
Yi Cheng	01b899dafb	[nightly] Fix broken test due to bad syntax #19536 (#19536 )	2021-10-19 21:43:46 -07:00
Yi Cheng	7a9cedfc5c	[nightly] Add grpc based broadcasting into nightly test for decision_tree (#19531 ) * dbg * up * check * up * up * put grpc based one into nightly test * up	2021-10-19 19:59:39 -07:00
Chen Shen	b38ebd368c	[Dataset][nighlyt-test] spend less money #19488 Reduce the epoch and ensure everything runs in the same datacenter.	2021-10-18 18:53:50 -07:00
Kai Fricke	ad94eb03c6	[ci/release] wrap pip github installs in quotation marks to prevent comment errors (#19464 )	2021-10-18 18:55:56 +01:00
Chen Shen	9dba5e0ead	[dataset][nightly-test] fix pipeline ingest test (#19437 )	2021-10-18 11:31:24 +01:00
Yi Cheng	1dc03cd49d	[nightly] Put many nodes actor test back (#19313 ) ## Why are these changes needed? There are two issues fixed in this PR: - make sure wait for session count alive node - upgrade the machine to match what's tested in oss ray. ## Related issue number https://github.com/ray-project/ray/issues/19084	2021-10-13 15:51:12 -07:00
SangBin Cho	dd1c1f9787	[Nightly test] remove env vars from tests (#19221 ) When testing it we should minimize unnecessary env vars (and it's better working with the default config). This PR removes unnecessary env vars that are set.	2021-10-08 06:53:23 -07:00
Clark Zinzow	ca731d7c86	[Datasets] Fix API breakage in Datasets nightly test.	2021-10-07 15:07:19 -07:00
SangBin Cho	22f4ffed08	Disable cpu-only-nodes preferred scheduling that breaks placement groups. (#19129 ) * Add a regression test for the short term * done * address code review * lint	2021-10-07 05:34:04 -07:00
Eric Liang	86cbe3e833	[data] Add support for repeating and re-windowing a DatasetPipeline (#19091 )	2021-10-06 20:13:43 -07:00
Yi Cheng	1eecb7d80b	up (#19092 )	2021-10-04 23:54:31 -07:00
SangBin Cho	55227a15b9	Handle retry to avoid statement timeout exception/ (#18968 )	2021-09-29 23:04:35 -07:00
Yi Cheng	a993f3a262	[nightly] update nightly test for many node test	2021-09-29 17:28:44 -07:00
Dmitri Gekhtman	944309c017	Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582 )" (#18954 ) * Revert "[nightly] Deflaky nightly test many_nodes_actor_test (#18582)" This reverts commit `fc6a739e4b`. * move to large test Co-authored-by: Yi Cheng <chengyidna@gmail.com>	2021-09-29 11:02:14 -04:00
Chen Shen	62a73f4ce8	[nightly test][event] enable event logs in nightly tests (#18936 )	2021-09-28 01:29:26 -07:00
Chen Shen	7c99aae033	[dataset][nightly-test] add pipelined ingestion/training nightly test	2021-09-23 20:39:03 -07:00
Yi Cheng	fc6a739e4b	[nightly] Deflaky nightly test many_nodes_actor_test (#18582 )	2021-09-20 22:43:48 -07:00
Kai Fricke	7d1e6d3129	[ci/release] Add sanity check for ray wheels hash to release tests (#18489 )	2021-09-10 17:50:31 +01:00
Yi Cheng	23e9af0601	[test] Add x nodes y actors test to nightly tests (#18291 )	2021-09-03 18:54:23 -07:00
SangBin Cho	814095add6	Revert "Change instance type for some tests (#18248 )" (#18320 ) This reverts commit `34026a7bd5`.	2021-09-02 17:45:02 -07:00
SangBin Cho	34026a7bd5	Change instance type for some tests (#18248 )	2021-08-31 10:10:46 -07:00
SangBin Cho	eab506cc37	[Test] Disable non streaming shuffle 5000 partitions (#18224 ) * Disable non streaming shuffle 5000 partitions * increase timeout for 5000 partition shuffle	2021-08-31 00:28:15 -07:00
SangBin Cho	dfbad8668a	Support better infra failure detection + stable flag (#18202 )	2021-08-30 10:51:03 -07:00
SangBin Cho	43da68e657	Fix a nightly dask on ray test (#18060 )	2021-08-24 22:15:34 -07:00
Chen Shen	89f988e9cc	add dataset shuffle data loader (#17917 )	2021-08-20 11:26:01 -07:00
SangBin Cho	4971e13941	[Build] Asan wheel test (#17685 ) * in progerss * ASAN tests. * d * in progress * in progress without the asan wheel * Support the asan wheel. * Support the asan wheels * Not build a binary for asan * Fix issues * Remove a wrong build * Separate out asan wheel build * Try preparing more deps. * ip * Try different version * done * d * Trial * Another try * Another try * skip cpp build to see what happens * add more des * ip * abc * Try next * completed * try * Try without static libasan * dbg * Try static link * Fix issues * abc	2021-08-17 10:21:41 -07:00
Eric Liang	ce171f10a1	Remove legacy plasma unlimited and pull manager pinning flag (#17753 )	2021-08-11 20:19:12 -07:00
SangBin Cho	a3c5cce834	Add prepare for dask on ray 1tb sort. (#17708 )	2021-08-10 16:26:05 -07:00

1 2

81 commits