Commit graph

32 commits

Author SHA1 Message Date
Clark Zinzow
b872fdaaac
[Datasets] Last-mile preprocessing docs. (#20712)
Datasets docs for last-mile preprocessing, particularly geared towards ML ingest. This gives groupby, aggregations, and random shuffling examples in the overview page (not present previously), adds some concreteness to our last-mile preprocessing positioning, and provides some preprocessing recipes for a few common transformations.
2021-11-29 23:23:27 -08:00
Richard Liaw
cf357f6bce
[docs] Add a talks section for ray.data (#20444) 2021-11-16 14:30:08 -08:00
Eric Liang
6102912494
Dataset doc updates (#19815) 2021-11-04 18:13:40 -07:00
Philipp Moritz
0a5942d8b0
[Documentation] Fix quotes for windows installations (#19859)
* [Documentation] Fix quotes for windows installations

* update

* formatting
2021-10-29 10:54:38 -07:00
Eric Liang
27a5b546ad
Make ArrowRow less scary (#19686) 2021-10-25 12:18:42 -07:00
Eric Liang
875d19f838
[data] Fix inconsistent naming of to_refs() methods, remove to_arrow() (#19620) 2021-10-23 12:20:23 -07:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train (#19436) 2021-10-18 22:27:46 -07:00
Eric Liang
430a5f4a21
[doc] Bump dataset to beta for 1.8 and add backlink to SGD (#19332) 2021-10-12 18:32:29 -07:00
Clark Zinzow
d22f838795
[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. (#18992) 2021-10-01 13:08:25 -07:00
Alex Wu
5709c6501b
[dataset][usability] Dataset dependencies (#18346) 2021-09-29 17:29:31 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks (#18867) 2021-09-27 16:24:09 -07:00
Eric Liang
4d2065352b
Increase dataset read parallelism by default (#18420) 2021-09-09 15:07:49 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. (#18301) 2021-09-08 10:09:01 -07:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs (#18357) 2021-09-06 01:39:08 -07:00
Eric Liang
7dcae690b9
Mark datasets as still in alpha for now (#18321) 2021-09-02 17:07:33 -07:00
Wesley Gifford
6133a561e9
Dataset from modin (#18122) 2021-08-31 11:19:35 -07:00
Eric Liang
95b5ad12ba
Initial version of workflow documentation (#18138) 2021-08-27 16:20:48 -07:00
Clark Zinzow
aee7ba2510
[Datasets] Add from_numpy() and to_numpy() APIs (#18146) 2021-08-27 13:33:11 -07:00
Eric Liang
71b3183038
Add implicit init note to Ray docs & dataset version note (#17751) 2021-08-11 13:13:22 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental (#17560) 2021-08-04 13:31:10 -07:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized (#17492) 2021-08-02 11:48:48 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset (#17439) 2021-08-01 22:45:21 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs (#17309) 2021-07-28 21:12:01 -07:00
Clark Zinzow
b5194ca9f9
Add imports to docs examples to make the code more runnable. (#17240) 2021-07-21 11:18:45 -07:00
Eric Liang
fabba96fad
Re-merge large function def, skipping test failing on Windows (#17191) 2021-07-19 18:03:26 -07:00
architkulkarni
4069686e0f
Revert "Improve error message for oversized function (#17133)" (#17184)
This reverts commit 3e53619d64.
2021-07-19 09:28:33 -07:00
Eric Liang
3e53619d64
Improve error message for oversized function (#17133) 2021-07-17 11:04:05 -07:00
Eric Liang
94f17ec099
[RFC] API stability annotations (#17100) 2021-07-16 17:09:20 -07:00
Eric Liang
26a286655b
Add link to datasets preview docs 2021-07-16 12:31:52 -07:00
Eric Liang
f03b43c532
[dataset] Support callable classes to simplify state initialization (#17136) 2021-07-15 23:06:14 -07:00
Eric Liang
3d764d7b4b
[data] Fix the ObjectRef type in the dataset docs (#17111)
* fix reft

* remove exp

* fix
2021-07-15 09:50:37 -07:00
Eric Liang
38bddc3f2b
First cut at dataset documentation (#16956) 2021-07-14 23:27:13 -07:00