Kai Fricke
d06c3ffd6f
[release] Migrate Tune + XGBoost tests to new infrastructure ( #22705 )
...
Migrate XGBoost and Tune tests to new release testing infrastructure.
https://buildkite.com/ray-project/release-tests-branch/builds/50
2022-03-01 08:10:06 +01:00
SangBin Cho
2c1184592e
mark threaded actor test unstable ( #22696 )
2022-02-28 15:25:14 -08:00
Clark Zinzow
cf3577f0ee
[Datasets] Patch Parquet file fragment serialization to prevent metadata fetching. ( #22665 )
2022-02-28 15:15:30 -08:00
Chen Shen
7e90700521
[Dataset][nighly-test] promote data ingestion test to stable #22702
2022-02-28 14:00:18 -08:00
Kai Fricke
3695408a85
[release] Fix special cases in release test package (e.g. smoke test) ( #22442 )
...
Fixing special cases (e.g. smoke tests, long running tests) in the release test package infrastructure. Prepare migration of Tune and XGBoost tests.
2022-02-28 21:05:01 +01:00
SangBin Cho
1cedb1b6e4
[Test] Increase timeout for microbenchmark ( #22655 )
2022-02-25 17:29:12 -08:00
Sven Mika
7b687e6cd8
[RLlib] SlateQ: Add a hard-task learning test to weekly regression suite. ( #22544 )
2022-02-25 21:58:16 +01:00
Archit Kulkarni
31332f8930
[serve] [release tests] Add health check grace period for 1k deployment ( #22651 )
2022-02-25 12:13:44 -06:00
Archit Kulkarni
1165f99b0b
[CI] disable Serve microbenchmark k8s ( #22631 )
2022-02-24 16:50:06 -08:00
Yi Cheng
de76d86bcb
[nightly] Stop GCS HA related nightly test ( #22636 )
...
Since we've already turned it on on master, we should stop these tests for now.
2022-02-24 16:40:08 -08:00
Jun Gong
99b7be5e22
[rllib] Fix impala long running test ( #22619 )
...
fix impala long running test.
Bandits is the first agent that requires torch import at registration time.
2022-02-24 09:03:55 -08:00
SangBin Cho
5e847f7e09
[Usage Stats] Usage stats only enabled on nightly test infra ( #22591 )
...
This PR **enables the usage stats only on the release test infrastructure** (large scale tests Ray runs on a daily basis in a private infra). Note it is still disabled by default in Ray.
2022-02-23 22:11:48 -08:00
Eric Liang
e15a419028
Enable stage fusion by default for dataset pipelines ( #22476 )
...
This PR enables stage fusion for dataset pipelines. This also requires:
1. Removing the num_cpus=0.5 default for the read stage, to enable fusion of the read stage.
2. Removing spread_resource_prefix (not supported for now).
2022-02-23 17:34:05 -08:00
Max Pumperla
29d94a2211
[docs] sphinx gallery removal, migrate to ipynb ( #22467 )
2022-02-19 01:19:07 -08:00
Jiajun Yao
baa14d695a
Round robin during spread scheduling ( #21303 )
...
- Separate spread scheduling and default hydra scheduling (i.e. SpreadScheduling != HybridScheduling(threshold=0)): they are already separated in the API layer and they have the different end goals so it makes sense to separate their implementations and evolve them independently.
- Simple round robin for spread scheduling: this is just a starting implementation, can be optimized later.
- Prefer not to spill back tasks that are waiting for args since the pull is already in progress.
2022-02-18 15:05:35 -08:00
Stephanie Wang
03a5589591
[core] Enable lineage reconstruction in CI ( #21519 )
...
Enables lineage reconstruction in all CI and release tests.
2022-02-18 11:04:20 -08:00
Chen Shen
17f589a05d
[Dataset][nighlty-test] use 2 instead of 15 windows for 1.5TB data ingestion #22479
2022-02-17 15:20:39 -08:00
mwtian
05dd72101b
[Release 1.11.0] Release logs for 1.11.0rc1 ( #22443 )
...
This is the release log for 1.11.0rc1, with GCS-Ray enabled. The diff is against 1.11.0rc0, without GCS-Ray.
2022-02-16 17:03:49 -08:00
Chen Shen
30ec0df9cc
[placement group] fix pg benchmark regression #22441
...
We added a warmup time in timeit which affects the pg benchmark time accounting. add an option to cancel warmup.
2022-02-16 16:24:51 -08:00
Jun Gong
a9147bb62c
[Release Test] Fix AnyscaleSDK construction so we can run CI on staging instance. ( #22325 )
2022-02-16 09:56:02 -08:00
SangBin Cho
42361a1801
[Test] Fix Dask on Ray 1 TB bug #22431 Open
...
Fixes a bug. It seems like not df is not working with dataframe
2022-02-17 02:44:36 +09:00
Kai Fricke
331b71ea8d
[ci/release] Refactor release test e2e into package ( #22351 )
...
Adds a unit-tested and restructured ray_release package for running release tests.
Relevant changes in behavior:
Per default, Buildkite will wait for the wheels of the current commit to be available. Alternatively, users can a) specify a different commit hash, b) a wheels URL (which we will also wait for to be available) or c) specify a branch (or user/branch combination), in which case the latest available wheels will be used (e.g. if master is passed, behavior matches old default behavior).
The main subpackages are:
Cluster manager: Creates cluster envs/computes, starts cluster, terminates cluster
Command runner: Runs commands, e.g. as client command or sdk command
File manager: Uploads/downloads files to/from session
Reporter: Reports results (e.g. to database)
Much of the code base is unit tested, but there are probably some pieces missing.
Example build (waited for wheels to be built): https://buildkite.com/ray-project/kf-dev/builds/51#_
Wheel build: https://buildkite.com/ray-project/ray-builders-branch/builds/6023
2022-02-16 17:35:02 +00:00
SangBin Cho
2ed5bb7a5f
[Nightly Test] Addressed client failure properly ( #22438 )
...
When the client returns the code that's not 0, we should raise RuntimeError to properly propagate errors
2022-02-16 09:03:17 -08:00
Jun Gong
04dd536987
[Release tests] Disable A3C CI tests on torch for now. Also extend performance_test deadline to 3hrs. ( #22426 )
2022-02-16 13:06:09 +01:00
Kai Fricke
c866131cc0
[tune] Retry cloud sync up/down/delete on fail ( #22029 )
2022-02-15 12:27:29 +00:00
SangBin Cho
640d92c385
It seems like the S3 read sometimes fails; #22214 . I found out the file actually does exist in S3, so it is highly likely a transient error. This PR adds a retry mechanism to avoid the issue.
...
It seems like the S3 read sometimes fails; #22214 . I found out the file actually does exist in S3, so it is highly likely a transient error. This PR adds a retry mechanism to avoid the issue.
2022-02-12 11:58:58 +09:00
Jun Gong
cbd24503b6
[RLlib] Add A3C to RLlib performance regression tests. ( #22316 )
2022-02-11 21:18:53 +01:00
Archit Kulkarni
da57012cbc
Add comment to periodic CI pipeline to update release process doc when updating test suites ( #22037 )
...
This PR adds a comment to build_pipeline.py reminding anyone who makes changes to the test suites to also update the release process doc if necessary.
This is an action item from the Ray 1.10.0 release retrospective.
2022-02-11 11:14:24 -06:00
Chen Shen
0866a5558f
[Dataset][nighlyt-test] pin pyarrow==4.0.1 for dataset related tests ( #22277 )
...
* pin pyarrow==4.0.1
* address comments
2022-02-10 14:22:41 -08:00
Sven Mika
04a5c72ea3
Revert "Revert "[RLlib] Speedup A3C up to 3x (new training_iteration function instead of execution_plan) and re-instate Pong learning test."" ( #18708 )
2022-02-10 13:44:22 +01:00
mwtian
47a56ca062
[Release] Add release logs for 1.11.0rc0 (GCS KV & pubsub not enabled) ( #22041 )
2022-02-10 00:03:31 -08:00
SangBin Cho
30000ff8ae
Fix a bug from many drivers. ( #22248 )
...
After this PR (https://github.com/ray-project/ray/pull/22156 ), for some reasons the driver script has some string that cannot be encoded with ascii. It seems like using utf-8 solves the problem.
2022-02-09 15:17:15 -08:00
Alex Wu
b122f093c1
Revert "[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test." ( #22250 )
...
Reverts ray-project/ray#22126
Breaks rllib:tests/test_io
2022-02-09 09:26:36 -08:00
Yi Cheng
8b1bbfe8e4
[e2e] Fix an error when "env_vars" is not set. ( #22234 )
...
To fix error in session https://buildkite.com/ray-project/periodic-ci/builds/2699#c532ed2b-ee89-48ad-a7db-fd4211ef8bd9
2022-02-08 22:05:53 -08:00
Yi Cheng
d8ac01bd5c
[e2e] Update e2e test to use redisless ray by default. ( #22189 )
...
As title, after infra got updated, we need to merge the PR so that test can run ray without redis.
2022-02-08 19:46:48 -08:00
Sven Mika
ac3e6ab411
[RLlib] Speedup A3C up to 3x (new training_iteration
function instead of execution_plan
) and re-instate Pong learning test. ( #22126 )
2022-02-08 19:04:13 +01:00
SangBin Cho
ac00389cbe
[Nightly test] Bring back the old way of running commands. ( #22209 )
...
Bring back the old way of running commands for non-k8s tests.
This also fixes the regression from many_drivers.py
2022-02-08 01:44:07 -08:00
Jiajun Yao
56c7b74072
Delete nightly shuffle_data_loader ( #22185 )
2022-02-07 15:23:34 -08:00
Eric Liang
00b5801d71
Fix datasets leaking worker processes due to closure capture of stats actor handle ( #22156 )
2022-02-07 14:05:44 -08:00
Jiajun Yao
355ee4a02c
Fix nightly shuffle_data_loader by pinning down dependencies versions ( #22183 )
2022-02-07 11:25:30 -08:00
Chen Shen
13819304d4
[Core][nightly-test] better way of calculating num features ( #22158 )
...
* better filter of column length
* address comments
* more
2022-02-07 02:13:40 -08:00
Kai Fricke
dd935874ee
[ci/release] Fix job submission command ( #22093 )
...
Ray job submission does not accept quoted commands anymore (#22011 ). This PR updates the command to fix job submission within e2e tests.
2022-02-04 00:05:52 +01:00
mwtian
b528bf9202
Revert "[e2e] Remove unnecessary logic around copying results ( #22034 )" ( #22088 )
...
This reverts commit 92d7e9bf98
.
2022-02-03 13:42:40 -08:00
mwtian
92d7e9bf98
[e2e] Remove unnecessary logic around copying results ( #22034 )
...
After #21905 , some of the logic around handling result artifacts become unnecessary or incorrect (in generating error logs). They are removed.
2022-02-03 12:15:06 -08:00
SangBin Cho
3c056a6b92
Revert "[Nightly Test] Add more metadata to test result ( #21990 )" ( #22052 )
...
This reverts commit fd20cf3239
.
2022-02-02 12:56:42 -08:00
SangBin Cho
fd20cf3239
[Nightly Test] Add more metadata to test result ( #21990 )
...
Add a columns, error code, commit url, stable, session url, and runtime
2022-01-31 22:33:30 -08:00
Yi Cheng
0659d4a472
[nightly] Limit many drivers iteration to 4000 iterations ( #21958 )
...
Due to faster running of many drivers, we limit the iteration to 4k for the test.
2022-01-31 13:26:02 -08:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black ( #21975 )
...
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Yi Cheng
570f67798a
[nightly] Move scheduling tests into one suite ( #21959 )
...
For future convenience, we are moving scheduling-related tests into one suite for easier monitoring and benchmarking.
2022-01-28 13:32:34 -08:00
Chen Shen
bfe3e5f4a8
add check on shape ( #21947 )
2022-01-28 12:27:43 -08:00