Commit graph

9 commits

Author SHA1 Message Date
Clark Zinzow
ea635aecd2
[Datasets] Support tensor columns in to_tf and to_torch. (#24752)
This PR adds support for tensor columns in the to_tf() and to_torch() APIs.

For Torch, this involves an explicit extension array check and (zero-copy) conversion of the tensor column to a NumPy array before converting the column to a Torch tensor.

For TensorFlow, this involves bypassing df.values when converting tensor feature columns to NumPy arrays, instead manually creating a single NumPy array from the column Series.

In both cases, I think that the UX around heterogeneous feature columns and squeezing the column dimension could be improved, but I'm saving that for a future PR.
2022-05-17 01:11:00 -07:00
Eric Liang
52491c87e2
Make a pass fixing Dataset API issues (#22886) 2022-03-08 13:07:55 -08:00
Max Pumperla
4dd221f848
[Docs] Ray Data docs target state (#21931)
Preview: [docs](https://ray--21931.org.readthedocs.build/en/21931/data/dataset.html)

The Ray Data project's docs now have a clearer structure and have partly been rewritten/modified. In particular we have

- [x] A Getting Started Guide
- [x] An explicit User / How-To Guide
- [x] A dedicated Key Concepts page
- [x] A consistent naming convention in `Ray Data` whenever is is referred to the project.

This surfaces quite clearly that, apart from the "Getting Started" sections, we really only have one real example. Once we have more, we can create an "Example" section like many other sub-projects have. This will be addressed in https://github.com/ray-project/ray/issues/21838.
2022-01-27 13:14:36 -08:00
xwjiang2010
9af8f11191
Revert "[docs] Clean up doc structure (first part) (#21667)" (#21763)
This reverts commit 38e46c9fb3.
2022-01-20 15:30:56 -08:00
Max Pumperla
38e46c9fb3
[docs] Clean up doc structure (first part) (#21667) 2022-01-20 16:19:04 +01:00
Clark Zinzow
d22f838795
[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992) 2021-10-01 13:08:25 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks (#18867) 2021-09-27 16:24:09 -07:00
Eric Liang
4d2065352b
Increase dataset read parallelism by default (#18420) 2021-09-09 15:07:49 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. (#18301) 2021-09-08 10:09:01 -07:00