Stephanie Wang
61676f26d3
Revert "Revert "[dataset] Use polars for sorting ( #24523 )" ( #24781 )" ( #25173 )
...
Polars is significantly faster than the current pyarrow-based sort. This PR uses polars for the internal sort implementation if available. No API changes needed.
On my laptop, this makes sorting 1GB about 2x faster:
without polars
$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 50.23415923118591
...
Stage 2 sort: executed in 38.59s
Substage 0 sort_map: 100/100 blocks executed
* Remote wall time: 864.21ms min, 1.94s max, 1.4s mean, 140.39s total
* Remote cpu time: 634.07ms min, 825.47ms max, 719.87ms mean, 71.99s total
* Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
* Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Substage 1 sort_reduce: 100/100 blocks executed
* Remote wall time: 125.66ms min, 2.3s max, 1.09s mean, 109.26s total
* Remote cpu time: 96.17ms min, 1.34s max, 725.43ms mean, 72.54s total
* Output num rows: 178073 min, 2313038 max, 1250000 mean, 125000000 total
* Output size bytes: 1446844 min, 18793434 max, 10156250 mean, 1015625046 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
with polars
$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 24.097432136535645
...
Stage 2 sort: executed in 14.02s
Substage 0 sort_map: 100/100 blocks executed
* Remote wall time: 165.15ms min, 595.46ms max, 398.01ms mean, 39.8s total
* Remote cpu time: 349.75ms min, 423.81ms max, 383.29ms mean, 38.33s total
* Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
* Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Substage 1 sort_reduce: 100/100 blocks executed
* Remote wall time: 21.21ms min, 472.34ms max, 232.1ms mean, 23.21s total
* Remote cpu time: 29.81ms min, 460.67ms max, 238.1ms mean, 23.81s total
* Output num rows: 114079 min, 2591410 max, 1250000 mean, 125000000 total
* Output size bytes: 912632 min, 20731280 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Related issue number
Closes #23612 .
2022-05-27 10:43:51 -07:00
Stephanie Wang
c62e00ed6d
[dataset] Use polars for sorting ( #24523 )
...
Polars is significantly faster than the current pyarrow-based sort. This PR uses polars for the internal sort implementation if available. No API changes needed.
On my laptop, this makes sorting 1GB about 2x faster:
without polars
$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 50.23415923118591
...
Stage 2 sort: executed in 38.59s
Substage 0 sort_map: 100/100 blocks executed
* Remote wall time: 864.21ms min, 1.94s max, 1.4s mean, 140.39s total
* Remote cpu time: 634.07ms min, 825.47ms max, 719.87ms mean, 71.99s total
* Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
* Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Substage 1 sort_reduce: 100/100 blocks executed
* Remote wall time: 125.66ms min, 2.3s max, 1.09s mean, 109.26s total
* Remote cpu time: 96.17ms min, 1.34s max, 725.43ms mean, 72.54s total
* Output num rows: 178073 min, 2313038 max, 1250000 mean, 125000000 total
* Output size bytes: 1446844 min, 18793434 max, 10156250 mean, 1015625046 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
with polars
$ python release/nightly_tests/dataset/sort.py --partition-size=1e7 --num-partitions=100
Dataset size: 100 partitions, 0.01GB partition size, 1.0GB total
Finished in 24.097432136535645
...
Stage 2 sort: executed in 14.02s
Substage 0 sort_map: 100/100 blocks executed
* Remote wall time: 165.15ms min, 595.46ms max, 398.01ms mean, 39.8s total
* Remote cpu time: 349.75ms min, 423.81ms max, 383.29ms mean, 38.33s total
* Output num rows: 1250000 min, 1250000 max, 1250000 mean, 125000000 total
* Output size bytes: 10000000 min, 10000000 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Substage 1 sort_reduce: 100/100 blocks executed
* Remote wall time: 21.21ms min, 472.34ms max, 232.1ms mean, 23.21s total
* Remote cpu time: 29.81ms min, 460.67ms max, 238.1ms mean, 23.81s total
* Output num rows: 114079 min, 2591410 max, 1250000 mean, 125000000 total
* Output size bytes: 912632 min, 20731280 max, 10000000 mean, 1000000000 total
* Tasks per node: 100 min, 100 max, 100 mean; 1 nodes used
Related issue number
Closes #23612 .
2022-05-12 18:35:50 -07:00