siddgoel
0722cbb37e
Add support for snappy text decompression #22298 ( #22486 )
...
Adds a streaming based reading option for Snappy-compressed files. Arrow doesn't support streaming Snappy decompression since the canonical C++ Snappy library doesn't natively support streaming decompression. This PR works around this by doing streaming reads of snappy-compressed files using the streaming decompression API provided in the [python-snappy](https://github.com/andrix/python-snappy ) package.
This commit supplies a custom datasource that uses Arrow + [python-snappy](https://github.com/andrix/python-snappy ) to read and decompress Snappy-compressed files.
Co-authored-by: siddharth.goel <siddharth.goel@bytedance.com>
Co-authored-by: Chen Shen <scv119@gmail.com>
2022-03-15 13:52:22 -07:00
dependabot[bot]
767b349b99
[data](deps): Bump dask[complete] ( #22334 )
...
Bumps [dask[complete]](https://github.com/dask/dask ) from 2022.1.0 to 2022.2.0.
- [Release notes](https://github.com/dask/dask/releases )
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md )
- [Commits](https://github.com/dask/dask/compare/2022.01.0...2022.02.0 )
---
updated-dependencies:
- dependency-name: dask[complete]
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-14 12:44:20 -08:00
dependabot[bot]
1f563aaf9b
[data](deps): Bump dask[complete] from 2021.11.0 to 2022.1.0 in /python/requirements/data_processing ( #21621 )
...
Bumps [dask[complete]](https://github.com/dask/dask ) from 2021.11.0 to 2022.1.0.
2022-01-18 15:32:07 -08:00
Yi Cheng
d2d749b6f9
[workflow] Fix test_serialization.py ( #21522 )
...
The new version of responses will introduce some errors in the test. This PR fixed responses.
It also fixed moto in case of future updates upstream.
2022-01-11 11:45:18 -08:00
matthewdeng
caa4ff3783
[train][datasets] update example and remove dask ( #20592 )
2021-11-21 17:06:44 -08:00
dependabot[bot]
adf39941f4
[data](deps): Bump dask[complete] ( #20125 )
...
Bumps [dask[complete]](https://github.com/dask/dask ) from 2021.9.1 to 2021.11.0.
- [Release notes](https://github.com/dask/dask/releases )
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md )
- [Commits](https://github.com/dask/dask/compare/2021.09.1...2021.11.0 )
---
updated-dependencies:
- dependency-name: dask[complete]
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-07 11:55:39 -08:00
matthewdeng
78e9ff7c91
[train][datasets] add example for big data training ( #20042 )
...
* [train][datasets] add example for big data training
* add title docstring
* lint and dependencies
* add dask_ml requirement
2021-11-05 09:28:48 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 ( #19592 )
...
* Bump up dask to latest compatible version 2021.9.1
* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
Edward Oakes
888fb24c25
Remove deprecated ray.services package ( #18475 )
...
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-14 16:28:16 +01:00
Zhi Lin
2fcd1bcb4b
[Dataset] implement from_spark
, to_spark
and some optimizations ( #17340 )
2021-09-09 11:43:47 -07:00
Wesley Gifford
6133a561e9
Dataset from modin ( #18122 )
2021-08-31 11:19:35 -07:00
Yi Cheng
f579822790
[workflow] Workflow inside virtual actor ( #18066 )
2021-08-30 10:40:22 -07:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. ( #17158 )
2021-08-12 08:39:31 -07:00
Yi Cheng
5f4d9085d2
[workflow] workflow ci enable ( #17255 )
...
* Enable workflow tests
* update
* Fix one bug
2021-07-22 17:59:24 -07:00
Yi Cheng
dc0f948cb9
[workflow] S3 support for workflow ( #16993 )
...
* up
* up
* up
* format
* up
* fix comment
* up
* update
* update
* move dep
* bump pytest versin
* use lazy_fixture explicitly
* format
2021-07-13 19:14:41 -07:00
Yi Cheng
4bb3883a73
[dataset] deduct filesystem automatically ( #16762 )
2021-07-03 00:50:59 -07:00
Clark Zinzow
52da2cce68
[Dataset] Adds JSON, CSV, Pandas, and Dask IO layers, and adds the write side of the Parquet IO layer. ( #16724 )
2021-07-01 11:57:40 -07:00