Commit graph

21 commits

Author SHA1 Message Date
Kai Fricke
9b49417a72
[ci/hotfix] Pin raydp-nightly (#26358)
Alternative to #26356 - here we just pin raydp-nightly and resolve the dependency issues in follow-up PRs.

This is to quickly unblock CI.

Signed-off-by: Kai Fricke <kai@anyscale.com>
2022-07-07 14:54:01 +01:00
Antoni Baum
668049492c
[Datasets] Add from_huggingface for Hugging Face datasets integration (#24464)
Adds a from_huggingface method to Datasets, which allows the conversion of a Hugging Face Dataset to a Ray Dataset. As a Hugging Face Dataset is backed by an Arrow table, the conversion is trivial.
2022-05-06 13:09:28 -07:00
Balaji Veeramani
2190f7ff25
[Datsets] Add SimpleTensorFlowDatasource (#24022)
This PR makes it easier to use TensorFlow datasets with Ray Datasets.
2022-04-29 12:15:30 -07:00
Shawn
43ed78f6fd
[Datasets] Integrate Mars-on-Ray with Datasets; improve docs and add tests (#23402)
Add Mars-on-Ray + Datasets integration; improve Mars-on-Ray docs and add tests.
2022-04-29 09:43:52 -07:00
siddgoel
0722cbb37e
Add support for snappy text decompression #22298 (#22486)
Adds a streaming based reading option for Snappy-compressed files. Arrow doesn't support streaming Snappy decompression since the canonical C++ Snappy library doesn't natively support streaming decompression. This PR works around this by doing streaming reads of snappy-compressed files using the streaming decompression API provided in the [python-snappy](https://github.com/andrix/python-snappy) package.

This commit supplies a custom datasource that uses Arrow + [python-snappy](https://github.com/andrix/python-snappy) to read and decompress Snappy-compressed files.

Co-authored-by: siddharth.goel <siddharth.goel@bytedance.com>
Co-authored-by: Chen Shen <scv119@gmail.com>
2022-03-15 13:52:22 -07:00
dependabot[bot]
767b349b99
[data](deps): Bump dask[complete] (#22334)
Bumps [dask[complete]](https://github.com/dask/dask) from 2022.1.0 to 2022.2.0.
- [Release notes](https://github.com/dask/dask/releases)
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md)
- [Commits](https://github.com/dask/dask/compare/2022.01.0...2022.02.0)

---
updated-dependencies:
- dependency-name: dask[complete]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-14 12:44:20 -08:00
dependabot[bot]
1f563aaf9b
[data](deps): Bump dask[complete] from 2021.11.0 to 2022.1.0 in /python/requirements/data_processing (#21621)
Bumps [dask[complete]](https://github.com/dask/dask) from 2021.11.0 to 2022.1.0.
2022-01-18 15:32:07 -08:00
Yi Cheng
d2d749b6f9
[workflow] Fix test_serialization.py (#21522)
The new version of responses will introduce some errors in the test. This PR fixed responses.

It also fixed moto in case of future updates upstream.
2022-01-11 11:45:18 -08:00
matthewdeng
caa4ff3783
[train][datasets] update example and remove dask (#20592) 2021-11-21 17:06:44 -08:00
dependabot[bot]
adf39941f4
[data](deps): Bump dask[complete] (#20125)
Bumps [dask[complete]](https://github.com/dask/dask) from 2021.9.1 to 2021.11.0.
- [Release notes](https://github.com/dask/dask/releases)
- [Changelog](https://github.com/dask/dask/blob/main/docs/release-procedure.md)
- [Commits](https://github.com/dask/dask/compare/2021.09.1...2021.11.0)

---
updated-dependencies:
- dependency-name: dask[complete]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-11-07 11:55:39 -08:00
matthewdeng
78e9ff7c91
[train][datasets] add example for big data training (#20042)
* [train][datasets] add example for big data training

* add title docstring

* lint and dependencies

* add dask_ml requirement
2021-11-05 09:28:48 -07:00
Jiajun Yao
256bf0bf3a
[Release] Bump up dask to latest compatible version 2021.9.1 (#19592)
* Bump up dask to latest compatible version 2021.9.1

* Bump up dask to latest compatible version 2021.9.1
2021-10-22 09:16:28 -07:00
Edward Oakes
888fb24c25
Remove deprecated ray.services package (#18475)
Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-10-14 16:28:16 +01:00
Zhi Lin
2fcd1bcb4b
[Dataset] implement from_spark, to_spark and some optimizations (#17340) 2021-09-09 11:43:47 -07:00
Wesley Gifford
6133a561e9
Dataset from modin (#18122) 2021-08-31 11:19:35 -07:00
Yi Cheng
f579822790
[workflow] Workflow inside virtual actor (#18066) 2021-08-30 10:40:22 -07:00
Clark Zinzow
d6eeb5dc70
[Datasets] Add local and S3 filesystem test coverage for file-based datasources. (#17158) 2021-08-12 08:39:31 -07:00
Yi Cheng
5f4d9085d2
[workflow] workflow ci enable (#17255)
* Enable workflow tests

* update

* Fix one bug
2021-07-22 17:59:24 -07:00
Yi Cheng
dc0f948cb9
[workflow] S3 support for workflow (#16993)
* up

* up

* up

* format

* up

* fix comment

* up

* update

* update

* move dep

* bump pytest versin

* use lazy_fixture explicitly

* format
2021-07-13 19:14:41 -07:00
Yi Cheng
4bb3883a73
[dataset] deduct filesystem automatically (#16762) 2021-07-03 00:50:59 -07:00
Clark Zinzow
52da2cce68
[Dataset] Adds JSON, CSV, Pandas, and Dask IO layers, and adds the write side of the Parquet IO layer. (#16724) 2021-07-01 11:57:40 -07:00