hiro/ray

mirror of https://github.com/vale981/ray synced 2025-03-06 02:21:39 -05:00

History

Stephanie Wang 61676f26d3 Revert "Revert "[dataset] Use polars for sorting (#24523 )" (#24781 )" (#25173 ) Polars is significantly faster than the current pyarrow-based sort. This PR uses polars for the internal sort implementation if available. No API changes needed. On my laptop, this makes sorting 1GB about 2x faster: without polars $ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100 Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total Finished in 50.23415923118591 ... Stage 2 sort: executed in 38.59s Substage 0 sort_map: 100/100 blocks executed * Remote wall time: 864.21ms min, 1.94s max, 1.4s mean, 140.39s total * Remote cpu time: 634.07ms min, 825.47ms max, 719.87ms mean, 71.99s total * Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total * Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used Substage 1 sort_reduce: 100/100 blocks executed * Remote wall time: 125.66ms min, 2.3s max, 1.09s mean, 109.26s total * Remote cpu time: 96.17ms min, 1.34s max, 725.43ms mean, 72.54s total * Output num rows: 178073 min, 2313038 max, 1250000 mean, 125000000 total * Output size bytes: 1446844 min, 18793434 max, 10156250 mean, 1015625046 total * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used with polars $ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100 Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total Finished in 24.097432136535645 ... Stage 2 sort: executed in 14.02s Substage 0 sort_map: 100/100 blocks executed * Remote wall time: 165.15ms min, 595.46ms max, 398.01ms mean, 39.8s total * Remote cpu time: 349.75ms min, 423.81ms max, 383.29ms mean, 38.33s total * Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total * Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used Substage 1 sort_reduce: 100/100 blocks executed * Remote wall time: 21.21ms min, 472.34ms max, 232.1ms mean, 23.21s total * Remote cpu time: 29.81ms min, 460.67ms max, 238.1ms mean, 23.81s total * Output num rows: 114079 min, 2591410 max, 1250000 mean, 125000000 total * Output size bytes: 912632 min, 20731280 max, 10000000 mean, 1000000000 total * Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used Related issue number Closes #23612.		2022-05-27 10:43:51 -07:00
..
air_tests/horovod	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
benchmarks	[ci/release] Support running tests with different python versions (#24843 )	2022-05-17 17:03:12 +01:00
golden_notebook_tests	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
jobs_tests	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
kubernetes_manual_tests	[minor] Fix legacy OSS operator test (#23540 )	2022-04-18 17:47:42 -07:00
lightgbm_tests	[ci/release] Use 1.12.1 as base image in app configs (#25216 )	2022-05-26 18:58:20 +02:00
long_running_distributed_tests	[RLlib] Upgrade gym 0.23 (#24171 )	2022-05-23 08:18:44 +02:00
long_running_tests	[RLlib] Upgrade gym 0.23 (#24171 )	2022-05-23 08:18:44 +02:00
microbenchmark	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
ml_user_tests	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
nightly_tests	Revert "Revert "[dataset] Use polars for sorting (#24523 )" (#24781 )" (#25173 )	2022-05-27 10:43:51 -07:00
ray_release	[ci/release] Use fullmatch instead of match for regex filters (#25225 )	2022-05-26 20:02:00 +02:00
release_logs	[ci] Fix automatic buildkite token fetching in fetch_release_logs.py (#24606 )	2022-05-10 09:24:10 +02:00
rllib_tests	[RLlib]: Rename `input_evaluation` to `off_policy_estimation_methods`. (#25107 )	2022-05-27 13:14:54 +02:00
runtime_env_tests	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
serve_tests	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
train_tests/horovod	[Nightly test] Move two line downloads to one line. (#25061 )	2022-05-22 00:07:03 -07:00
tune_tests	[RLlib] Upgrade gym 0.23 (#24171 )	2022-05-23 08:18:44 +02:00
util	Update pip download test.sh #24997	2022-05-19 16:46:15 -07:00
xgboost_tests	[ci/release] Use 1.12.1 as base image in app configs (#25216 )	2022-05-26 18:58:20 +02:00
__init__.py	[release] move release testing end to end script to main ray repo (#17070 )	2021-07-14 12:39:07 -07:00
BUILD	[Serve] Add serve handle graph workload nightly tests (#24435 )	2022-05-04 09:07:50 -07:00
README.md	[Release] Remove release process doc (#19312 )	2021-10-18 11:24:03 -07:00
release_tests.yaml	[core] Remove more expensive shuffle tests (#25165 )	2022-05-24 18:05:18 -07:00
requirements.txt	[ci/release] Use 1.12.1 as base image in app configs (#25216 )	2022-05-26 18:58:20 +02:00
requirements_buildkite.txt	[ci/release] Use 1.12.1 as base image in app configs (#25216 )	2022-05-26 18:58:20 +02:00
run_release_test.sh	[ci/release] Use 1.12.1 as base image in app configs (#25216 )	2022-05-26 18:58:20 +02:00
setup.py	[ci/release] Refactor release test e2e into package (#22351 )	2022-02-16 17:35:02 +00:00

README.md

Release Tests

While the exact release process relies on Anyscale internal tooling, the tests we run during the releases are located at https://github.com/ray-project/ray/blob/master/release/.buildkite/build_pipeline.py