ray/release
Stephanie Wang c62e00ed6d
[dataset] Use polars for sorting (#24523)
Polars is significantly faster than the current pyarrow-based sort. This PR uses polars for the internal sort implementation if available. No API changes needed.

On my laptop, this makes sorting 1GB about 2x faster:

without polars

$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 50.23415923118591
...
Stage 2 sort: executed in 38.59s

        Substage 0 sort_map: 100/100 blocks executed
        * Remote wall time: 864.21ms min, 1.94s max, 1.4s mean, 140.39s total
        * Remote cpu time: 634.07ms min, 825.47ms max, 719.87ms mean, 71.99s total
        * Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
        * Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
        * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used

        Substage 1 sort_reduce: 100/100 blocks executed
        * Remote wall time: 125.66ms min, 2.3s max, 1.09s mean, 109.26s total
        * Remote cpu time: 96.17ms min, 1.34s max, 725.43ms mean, 72.54s total
        * Output num rows: 178073 min, 2313038 max, 1250000 mean, 125000000 total
        * Output size bytes: 1446844 min, 18793434 max, 10156250 mean, 1015625046 total
        * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used

with polars

$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 24.097432136535645
...
Stage 2 sort: executed in 14.02s

        Substage 0 sort_map: 100/100 blocks executed
        * Remote wall time: 165.15ms min, 595.46ms max, 398.01ms mean, 39.8s total
        * Remote cpu time: 349.75ms min, 423.81ms max, 383.29ms mean, 38.33s total
        * Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
        * Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
        * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used

        Substage 1 sort_reduce: 100/100 blocks executed
        * Remote wall time: 21.21ms min, 472.34ms max, 232.1ms mean, 23.21s total
        * Remote cpu time: 29.81ms min, 460.67ms max, 238.1ms mean, 23.81s total
        * Output num rows: 114079 min, 2591410 max, 1250000 mean, 125000000 total
        * Output size bytes: 912632 min, 20731280 max, 10000000 mean, 1000000000 total
        * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used

Related issue number

Closes #23612.
2022-05-12 18:35:50 -07:00
..
air_tests/horovod [Tune] Deprecate DistributedTrainableCreator (#24453) 2022-05-10 11:06:43 -07:00
benchmarks [Test]Add a time check for task benchmark (#23170) 2022-04-11 06:27:04 -07:00
golden_notebook_tests [Train] Fully deprecate Ray SGD v1 (#24038) 2022-04-25 16:12:57 -07:00
jobs_tests Add basic jobs release test with Tune script (#23474) 2022-04-05 13:31:11 -05:00
kubernetes_manual_tests [minor] Fix legacy OSS operator test (#23540) 2022-04-18 17:47:42 -07:00
lightgbm_tests [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
long_running_distributed_tests [Train] Fully deprecate Ray SGD v1 (#24038) 2022-04-25 16:12:57 -07:00
long_running_tests [Test] Add grace period to long running actor test failure (#24469) 2022-05-04 16:00:22 -07:00
microbenchmark [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
ml_user_tests [ci/release] Re-install anyscale package after local env setup (#24373) 2022-05-01 16:51:55 +01:00
nightly_tests [dataset] Use polars for sorting (#24523) 2022-05-12 18:35:50 -07:00
ray_release [ci/release] Fix ray version from init test (#24510) 2022-05-05 16:05:23 +01:00
release_logs [ci] Fix automatic buildkite token fetching in fetch_release_logs.py (#24606) 2022-05-10 09:24:10 +02:00
rllib_tests [RLlib] Provide more time for APPO Pong release and performance tests. (#24503) 2022-05-05 18:19:38 +02:00
runtime_env_tests [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
serve_tests [Serve] Add serve handle graph workload nightly tests (#24435) 2022-05-04 09:07:50 -07:00
train_tests/horovod [Train] Fix multi node horovod bug (#22564) 2022-03-22 16:22:53 -07:00
tune_tests Revert "Revert "[tune] Also interrupt training when SIGUSR1 received"" (#24101) 2022-04-22 11:27:38 +01:00
util [ci/release] Remove old OSS release test infrastructure (#23134) 2022-03-14 15:10:52 +00:00
xgboost_tests [Release] Upgrade instance types for xgboost gpu release tests (#24002) 2022-04-20 15:18:22 -07:00
__init__.py [release] move release testing end to end script to main ray repo (#17070) 2021-07-14 12:39:07 -07:00
BUILD [Serve] Add serve handle graph workload nightly tests (#24435) 2022-05-04 09:07:50 -07:00
README.md [Release] Remove release process doc (#19312) 2021-10-18 11:24:03 -07:00
release_tests.yaml [Tune] Deprecate DistributedTrainableCreator (#24453) 2022-05-10 11:06:43 -07:00
requirements.txt [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
requirements_buildkite.txt [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00
run_release_test.sh [ci/release] Disable infra retries for now (#23132) 2022-03-14 11:51:11 +00:00
setup.py [ci/release] Refactor release test e2e into package (#22351) 2022-02-16 17:35:02 +00:00

Release Tests

While the exact release process relies on Anyscale internal tooling, the tests we run during the releases are located at https://github.com/ray-project/ray/blob/master/release/.buildkite/build_pipeline.py