hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

Author	SHA1	Message	Date
SangBin Cho	6cc493079b	[Core] Add Placement group performance test (#20218 ) * in progress * ip * Fix issues * done * Address code review.	2021-11-14 09:17:54 +09:00
SangBin Cho	b2acfd6ff4	[Test] Change the frequency of many nodes actor test (#20232 )	2021-11-10 21:12:22 -08:00
Simon Mo	215f47bc53	[CI] Move Serve nightly tests to a separate suite (#20194 ) So we can run them via separate cronjobs	2021-11-09 13:22:50 -08:00
SangBin Cho	90fd38c64a	[Test] Large scale threaded actor workload (#20105 ) * Done * Addressed code review. * lint * Update release/nightly_tests/stress_tests/test_threaded_actors.py Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>	2021-11-09 02:28:48 -08:00
SangBin Cho	5c4fb4dc91	[Core]Chaos testing nightly (#20059 ) * Done initial stage. * lint * . * Finished. * Fix lint	2021-11-08 21:57:53 -08:00
gjoliver	d8a61f801f	[RLlib] Create a set of performance benchmark tests to run nightly. (#19945 ) * Create a core set of algorithms tests to run nightly. * Run release tests under tf, tf2, and torch frameworks. * Fix * Add eager_tracing option for tf2 framework. * make sure core tests can run in parallel. * cql * Report progress while running nightly/weekly tests. * Innclude SAC in nightly lineup. * Revert changes to learning_tests * rebrand to performance test. * update build_pipeline.py with new performance_tests name. * Record stats. * bug fix, need to populate experiments dict. * Alphabetize yaml files. * Allow specifying frameworks. And do not run tf2 by default. * remove some debugging code. * fix * Undo testing changes. * Do not run CQL regression for now. * LINT. Co-authored-by: sven1977 <svenmika1977@gmail.com>	2021-11-08 18:15:13 +01:00
Yi Cheng	6a6cc434ba	[nightly] Remove grpc staging test since nightly is stable #20119 (#20119 )	2021-11-05 21:36:58 -07:00
Amog Kamsetty	3408b60d2b	[Release] Refactor User Tests (#20028 ) * wip * add directory * wip * try again * Revert "try again" This reverts commit 82d33ccea6f92848df025e019b87df73cea49e5d. * finish * formatting * fix merge * fix path * chmod * check * sudo * wip * update * fix horovod * try * typo * reduce num workers	2021-11-05 17:28:37 -07:00
gjoliver	2c1fa459d4	[RLlib] Add an RLlib Tune experiment to UserTest suite. (#19807 ) * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * Add an RLlib Tune experiment to UserTest suite. * Add ray.init() * Move example script to example/tune/, so it can be imported as module. * add __init__.py so our new module will get included in python wheel. * Add block device to RLlib test instances. * Reduce disk size a little bit. * Add metrics reporting * Allow max of 5 workers to accomodate all the worker tasks. * revert disk size change. * Minor updates * Trigger build * set max num workers * Add a compute cfg for autoscaled cpu and gpu nodes. * use 1gpu instance. * install tblib for debugging worker crashes. * Manually upgrade to pytorch 1.9.0 * -y * torch=1.9.0 * install torch on driver * bump timeout * Write a more informational result dict. * Revert changes to compute config files that are not used. * add smoke test * update * reduce timeout * Reduce the # of env per worker to 1. * Small fix for getting trial_states * Trigger build * simply result dict * lint * more lint * fix smoke test Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-11-03 17:04:27 -07:00
Kai Fricke	f96078687f	[xgboost/release] Xgboost/connect gpu test (#19838 ) * [xgboost/release] Add GPU connect user test * Use scaling cluster * typo * Increase xgboost placement group timeout * Much higher timeout * Move os environment timeout * Move os environ * [dev] install xgboost-ray from master * GPU xgboost master * Remove master install after new xgboost release * Install latest * Add master test	2021-11-02 08:40:48 -07:00
Amog Kamsetty	3a52187da8	[Release/Lightning] Add Ray lightning user test (#19812 ) * wip * wip * add ray lightning test * fix * update * merge and add * fix * fix * rename * autoscale * add tblib * gloo backend * typo * upgrade torch * latest and master	2021-11-01 18:29:48 -07:00
Amog Kamsetty	474e44f7e0	[Release/Horovod] Add user test for Horovod (#19661 ) * infra * wip * add test * typo * typo * update * rename * fix * full path * formatting * reorder * update * update * Update release/horovod_tests/workloads/horovod_user_test.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com> * bump num_workers * update installs * try * add pip_packages * min_workers * fix * bump pg timeout * Fix symlink * fix * fix * cmake * fix * pin filelock * final * update * fix * Update release/horovod_tests/workloads/horovod_user_test.py * fix * fix * separate compute template * test latest and master Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2021-11-01 18:28:07 -07:00
matthewdeng	e1e4a45b8d	[train] add simple Ray Train release tests (#19817 ) * [train] add simple Ray Train release tests * simplify tests * update * driver requirements * move to test * remove connect * fix * fix * fix torch * gpu * add assert * remove assert * use gloo backend * fix * finish Co-authored-by: Amog Kamsetty <amogkamsetty@yahoo.com>	2021-11-01 18:25:19 -07:00
architkulkarni	702bffe072	[runtime env] [test] Enable runtime env nightly test with working_dir reconnection (#19906 )	2021-10-31 10:48:48 -05:00
Kai Fricke	fa0158abe5	[tune] Cloud checkpointing release tests (#19638 )	2021-10-29 12:12:01 +02:00
Simon Mo	3e038aebb2	[CI] Allow release tests infra to accept buildkite artifacts (#19803 )	2021-10-27 13:04:01 -07:00
Yi Cheng	abec07700a	[nightly] Adding more tests related to grpc broadcasting to staging mode (#19779 ) ## Why are these changes needed? We have concern that grpc based broadcasting might have negative impact on pg related workload. This test is to ensure it's running well before merging. ## Related issue number #19438	2021-10-27 10:46:13 -07:00
Amog Kamsetty	6e61ca623d	[CI] Infra for "user" tests (#19662 )	2021-10-26 08:47:22 +01:00
Yi Cheng	7a7b356899	[Nightly test] add test for grpc broadcasting (#19579 )	2021-10-21 07:01:41 -07:00
Yi Cheng	7a9cedfc5c	[nightly] Add grpc based broadcasting into nightly test for decision_tree (#19531 ) * dbg * up * check * up * up * put grpc based one into nightly test * up	2021-10-19 19:59:39 -07:00
Yi Cheng	f47f69d31e	[nightly] Add decision_tree_autoscaling_20_runs to nightly test	2021-10-18 11:19:40 -07:00
Kai Fricke	6c6639a0d7	[ci/release] hotfix for undefined local variable (#19460 )	2021-10-18 11:28:33 +01:00
Kai Fricke	c10d434713	[release] Allow commit hashes instead of URLs, add bisection utility (#19398 )	2021-10-18 10:44:29 +01:00
Kai Fricke	e17b23fa5b	[ci/release] Add support for RAY_WHEELS url (#19364 )	2021-10-14 21:40:01 +01:00
Jiao	893f76daf9	[serve] Add serve FT nightly test to buildkite (#19361 )	2021-10-13 13:56:55 -07:00
SangBin Cho	22f4ffed08	Disable cpu-only-nodes preferred scheduling that breaks placement groups. (#19129 ) * Add a regression test for the short term * done * address code review * lint	2021-10-07 05:34:04 -07:00
Chen Shen	7c99aae033	[dataset][nightly-test] add pipelined ingestion/training nightly test	2021-09-23 20:39:03 -07:00
Kai Fricke	2cbf326410	[ci/release] store buildkite artifacts on buildkite (#18712 )	2021-09-22 11:35:59 +01:00
SangBin Cho	51d94ebee0	[Tests] Make nightly test work + Remove work stealing logs (#18300 ) * make tests work * .	2021-09-14 09:52:58 -07:00
Jiao	d3734d803d	[serve] Change nightly test docker image and enable micro benchmark (#18566 )	2021-09-14 09:41:21 -05:00
Yi Cheng	6011d4197f	Open [nightly] Add many_nodes_actor_test to nightly test (#18406 )	2021-09-08 11:15:48 -07:00
Sven Mika	5292b70fc6	[RLlib] Add multi-GPU attention net tests to nightly test suite (+ R2D2 tests for LSTM and attention nets). (#18368 )	2021-09-06 17:48:05 +02:00
Kai Fricke	4c3276644e	[release] After buildkite ask step, use RAY_TEST_REPO pipeline (#18074 )	2021-08-25 15:58:38 +02:00
Sven Mika	9883505e84	[RLlib] Add [LSTM=True + multi-GPU]-tests to nightly RLlib testing suite (for all algos supporting RNNs, except R2D2, RNNSAC, and DDPPO). (#18017 )	2021-08-24 21:55:27 +02:00
Kai Fricke	fca8af88d2	[release] Fix e2e environment variable passing from pipeline (#18000 )	2021-08-23 09:26:37 +02:00
Chen Shen	89f988e9cc	add dataset shuffle data loader (#17917 )	2021-08-20 11:26:01 -07:00
architkulkarni	36c26578a7	[runtime env] [test] Add nightly test to verify Ray wheel URLs are valid (#17938 )	2021-08-19 15:48:37 -07:00
Kai Fricke	651aae76b9	[release] Ask for configuration in buildkite (#17948 )	2021-08-19 17:51:05 +02:00
Sven Mika	a428f10ebe	[RLlib] Add multi-GPU learning tests to nightly. (#17778 )	2021-08-18 17:21:01 +02:00
architkulkarni	b173b33934	[tests] Add runtime envs release test to nightly build script (#17638 )	2021-08-06 13:18:25 -05:00
Sven Mika	a708cca4bc	[RLlib, Testing] Add RLlib tests to nightly/weekly release test automation. (#17543 )	2021-08-03 13:44:00 -04:00
Alex Wu	63e335caf2	Update build_pipeline.py (#17544 )	2021-08-03 10:40:29 -07:00
Alex Wu	9e79301d35	Split scalability envelope + smoke tests (#17455 ) * . * done? * done? * sang comments * . Co-authored-by: Alex Wu <alex@anyscale.com>	2021-07-30 10:20:19 -07:00
Jiao	3dc49c0b79	[serve] Add multi deployment to serve nightly tests (#17411 ) Co-authored-by: Jiao Dong <jiaodong@anyscale.com>	2021-07-29 11:47:58 -05:00
Jiao	994ff3ce21	[Serve] Add initial large scale tests (#17026 )	2021-07-20 08:56:29 -07:00
Alex Wu	93c16346bf	[Dataset] imagenet nightly test (#17069 )	2021-07-16 14:15:49 -07:00
SangBin Cho	ef1d9278b8	[Test] nightly test dask on ray multi node sort (#17141 )	2021-07-15 23:13:35 -07:00
Kai Fricke	ed131f87da	[release] move release testing end to end script to main ray repo (#17070 )	2021-07-14 12:39:07 -07:00

48 commits