hiro/ray - Forgejo: Beyond coding. We Forge.

hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 10:31:39 -05:00

Author	SHA1	Message	Date
SangBin Cho	7d4287a6ab	[Test] Move long running tests to run everyday (#21813 ) Long running tests are cheap and low overhead (small number of node usage). We should just promote this to run every day so we can catch regressions quickly.	2022-01-24 15:10:27 -08:00
SangBin Cho	ac5f38d7fd	[Test] Fix dask on ray test on K8s (#21816 ) Fix dash on ray large scale test on K8s. Basically, chmod requires a root access, which we don't have it by default in the k8s cluster. We don't need chmod I think (I verified the test passes without it).	2022-01-24 15:09:22 -08:00
SangBin Cho	6b4aac7a08	Promote unstable tests to stable (#21811 ) Promote tests that have passed 100% last 1 week to stable	2022-01-24 02:10:37 -08:00
SangBin Cho	babc03edf2	Add a threaded actor k8s test (#21739 ) Add threaded actor flaky test to k8s.	2022-01-23 20:12:57 -08:00
Max Pumperla	f9b71a8bf6	[docs] new structure (#21776 ) This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way: - [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign. - [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).	2022-01-21 15:42:05 -08:00
shrekris-anyscale	75b3080834	[Serve] Serve Autoscaling Release tests (#21208 )	2022-01-21 12:08:25 -08:00
Clark Zinzow	2cd3045b16	[Test Infra] Fix e2e.py help info for --report (#21757 ) This momentarily confused me as to whether --report would enable or disable reporting.	2022-01-21 03:29:50 -08:00
Yi Cheng	90093769df	[nightly] Add more many tasks tests (#21727 ) This PR add four tests for many tasks: many short tasks send from the single node many short tasks send from multiple nodes many long tasks send from multiple nodes many long tasks send from the single node TODO: migrate many nodes actor tests to this one. scheduling envelop should contain: (tasks): scheduling_test_many_xx_tasks_yy_nodes (actors):many_nodes_actor_test (to be combined with this one) (shuffle): pipelined_ingestion_1500_gb_15_windows (shuffle): dask_on_ray_1tb_sort	2022-01-20 14:52:26 -08:00
SangBin Cho	02af73a571	[Test] First core nightly test migration to k8s (#21698 ) The first migration of test into k8s. We are adopting a conservative approach (migrate slowly while we keep existing test suites). Once things are confirmed to be stable, we will migrate with more speed.	2022-01-19 13:31:49 -08:00
SangBin Cho	b1308b1c8c	[Test Infra] Unrevert team col (#21700 ) This fixes the previous problems from team column revert. This has 2 additional changes; alert handler receives the team argument, which was the root cause of breakage; https://github.com/ray-project/ray/pull/21289 Previously, tests without a team column were raising an exception, but I made the condition weaker (warning logs). I will eventually change it to raise an exception, but for smoother transition, we will log warning instead for a short time	2022-01-19 13:29:53 -08:00
Kai Fricke	e233f8172d	[ci/release] Terminate session on session startup timeout (#21703 ) When a session startup times out due to resources not being available, the session may still come up after that timeout. At that time the control script (e2e.py) is already terminated, so the session runs until the autosuspend limit is hit, incurring unnecessary costs. Instead, we should always trigger session termination on session timeout.	2022-01-19 10:01:03 -08:00
Kai Fricke	4ef0c6c434	[tune/release] Demote xgboost_sweep to weekly testing (#21704 ) XGBoost functionality is tested daily in the xgboost release test suite. The expensive XGBoost sweep test can thus be run weekly.	2022-01-19 09:15:04 -08:00
Chen Shen	74d4e7c20c	install botocore with s3fs to ensure no confliction (#21680 )	2022-01-18 23:09:16 -08:00
Jiajun Yao	bb04cc9d80	Use latest cmake for pipelined_ingestion and pipelined_training tests (#21674 )	2022-01-18 12:03:43 -08:00
Jun Gong	1315293dd8	[RLlib] Fix offline RL(BC & MARWIL) weekly learning tests. (#21643 )	2022-01-18 09:29:01 +01:00
Kai Fricke	0e9e8824e4	[ci/release] use s3 sync (#21626 ) Previous changes failed because a) permission errors b) unzip being unavailable at remote nodes. Instead we are using tar gzip archives now. This reverts commit `42bcab27e8`.	2022-01-15 17:53:19 -08:00
Kai Fricke	42bcab27e8	Revert "[Release Test] Opt-in tests to use K8s based cloud. (#21583 )" (#21605 ) This reverts commit `0d5fbcc7bb`.	2022-01-14 11:46:52 -08:00
Jun Gong	7517aefe05	[RLlib] Bring back BC and Marwil learning tests. (#21574 )	2022-01-14 14:35:32 +01:00
Simon Mo	0d5fbcc7bb	[Release Test] Opt-in tests to use K8s based cloud. (#21583 )	2022-01-13 17:20:36 -08:00
Jun Gong	83955a9407	[RLlib] Extend CQL perf test to 1hr. (#21449 )	2022-01-07 11:35:16 +01:00
Jiajun Yao	76b91efd9b	Fix wrong many_nodes_actor_test app config (#21404 ) RAY_GCS_ACTOR_SCHEDULING_ENABLED is wrong should be RAY_gcs_actor_scheduling_enabled. Since gcs based actor scheduling is not enabled yet so I just removed this flag.	2022-01-05 11:52:13 -08:00
Kai Fricke	aa35045b6f	[ci/release] Update to recent anyscale API changes (#21149 ) Recent changes in the anyscale API rendered the current e2e script incompatible. This PR resolves these subtle API changes.	2022-01-04 11:21:47 +00:00
Chen Shen	704404d408	[BigDataTraining] Fix test script introduced by API change (#21347 ) * fix * fix test failure * Update release/nightly_tests/dataset/ray_sgd_training.py Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 12:14:36 -08:00
Antoni Baum	7ce22b72ed	[datasets] Expand `to_torch`'s functionality (#21117 ) Expands the `to_torch` method for Datasets with: * An ability to choose to output a list/dict of feature tensors instead of just one (through setting `feature_columns` to be a list of lists or a dict of lists) * An ability to choose whether the label should be unsqueezed or not * An ability to pass `None` as the label (for prediction). Furthermore, this changes how the `feature_column_dtypes` argument works. Previously, it took a list of dtypes for each feature. However, as the tensor was concatenated in the end, only one dtype mattered (the biggest one). Now, this argument expects a single dtype which will be applied to the features tensor (or a list/dict if `feature_columns` is a list of list/dict of lists). Unit tests for all cases are included. Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>	2022-01-03 09:03:50 -08:00
Jiajun Yao	9776e21842	Revert "Round robin during spread scheduling (#19968 )" (#21293 ) This reverts commit `60388b2834`.	2021-12-30 10:33:06 +09:00
mwtian	0b3fed5ef3	Revert "[Nightly Test] Add a team column to each test config. (#21198 )" (#21289 ) This reverts commit `b5b11b2d06`.	2021-12-30 06:44:51 +09:00
SangBin Cho	b5b11b2d06	[Nightly Test] Add a team column to each test config. (#21198 ) Please review e2e.py and test_suite belonging to your team! This is the first part of https://docs.google.com/document/d/16IrwerYi2oJugnRf5hvzukgpJ6FAVEpB6stH_CiNMjY/edit# This PR adds a team name to each test suite. If the name is not specified, it will be reported as unspecified. If you are running a local test, and if the new test suite doesn't have a team name specified, it will raise an exception (in this way, we can avoid missing team names in the future). Note that we will aggregate all of test config into a single file, nightly_test.yaml.	2021-12-27 14:42:41 -08:00
Akash Patel	cbcd03b779	Upgrade cython to 0.29.26 for py310 (#21244 )	2021-12-26 20:26:08 -08:00
Jiajun Yao	60388b2834	Round robin during spread scheduling (#19968 )	2021-12-22 20:27:34 -08:00
Jiajun Yao	7d861a2c58	[Test] Add ray wheel sanity check (#21223 )	2021-12-21 14:24:02 -08:00
architkulkarni	2489b17634	[release] Uninstall old ray in all release test app configs to fix commit mismatch error (#21175 ) * uninstall old ray in all release test app configs * add instruction to e2e.py dosctring	2021-12-18 16:58:49 -08:00
Chen Shen	c9c3f0745a	[Dataset][nighlytest] use latest ray for running test #21148 We are actually using the ray comes with the image, which is on a very old version of Ray. (suprised this actually works)	2021-12-17 23:48:44 -08:00
architkulkarni	56bd8e58de	[CI] [Release] uninstall Ray before installing new Ray version (#21159 )	2021-12-17 16:25:15 -08:00
architkulkarni	4dcba1d0f4	[CI] Pin anyscale version to fix release tests (#21138 )	2021-12-16 13:15:16 -08:00
Chen Shen	80eb00f525	[Chaos] fix dataset chaos test #21113	2021-12-15 20:13:38 -08:00
Clark Zinzow	ec06a1f65e	[CUJ#2] Update nightly test for CUJ#2 #21064	2021-12-15 13:19:59 -08:00
Jun Gong	767f78eaf8	[RLlib] Always attach latest eval metrics. (#21011 )	2021-12-15 11:42:53 +01:00
SangBin Cho	1c1430ff5c	Add memory monitor to scalability tests. (#21102 ) This adds memory monitoring to scalability envelope tests so that we can compare the peak memory usage for both nonHA & HA. NOTE: the current way of adding memory monitor is not great, and we should implement fixture to support this better, but that's not in progress yet.	2021-12-15 01:31:38 -08:00
Chen Shen	3c426ed7b5	[nighly-test] fix dataset nightly test reporting #21061	2021-12-14 00:05:40 -08:00
Kai Fricke	b58f839534	[ci/release] Remove hard numpy removal from app configs (#21005 )	2021-12-13 15:22:02 +00:00
xwjiang2010	46d2f2c160	[release test] Update torch_tune_serve test to be compatible with new TrialCheckpoint class. (#21010 )	2021-12-10 17:26:15 +00:00
Yi Cheng	4e0de0053d	[nightly] Add staging nightly test for gcs ha (#21004 ) This PR adds four staging nightly tests for gcs : - many_actors - many_tasks - many_pgs - many_nodes These are benchmark tests that are highly related to gcs ha. To make it easier to add tests, this PR also change e2e.py a little bit to include testing flags to app config.	2021-12-09 23:07:23 -08:00
Chen Shen	d0e79a36f9	[chaos-test] chaos test pipeline ingestion (#20929 ) since it has been passing my test run; i'll land it and mark it as unstable.	2021-12-09 13:43:00 -08:00
Chen Shen	6a274dfd76	CI][Chaos-test] chaos test now can set max-nodes-to-kill #20962	2021-12-09 13:41:46 -08:00
Chen Shen	aca954e8dd	[dataset][cuj2] add another single node ingestion example (#20754 ) * add runner * fix bugs * add configs * add time	2021-12-07 02:50:17 -08:00
Chen Shen	a628182cf5	[nighly-test] update cuj2 to reflect latest change #20889 we fixed groupby issue in cuj2; sync the change into nightly test. this test doesn't need to use gpu at all. it returns soon after data ingestion finishes.	2021-12-06 09:59:21 -08:00
Kai Fricke	b3a9d4d87d	[ci/release] Remove quotation marks from pip installs (#20638 ) Quotation marks were needed in Anyscale app configs to avoid install errors when # were used e.g. in URLs. Since this has been fixed on the Anyscale side, we can get rid of these.	2021-12-05 17:57:08 -08:00
xwjiang2010	368da1742b	[tune] Enforce one future at a time for any given trial at any given time. (#20783 ) Also enforce disabling (instead of allowing user to override this) buffer training when checkpoint_at_end is used.	2021-12-03 08:14:12 -08:00
Kai Fricke	6b683ec8dc	[ci] Retry release tests on infra error (#20478 ) This PR introduces proper exit codes for release tests. These are used to restart a certain set of infrastructure related failures automatically.	2021-12-02 10:34:40 -08:00
Yi Cheng	b25a757c91	[release] update release log for 1.9.0 release (#20781 ) Update 1.9.0 release log.	2021-11-29 22:20:37 -08:00

... 3 4 5 6 7 ...

605 commits