Clark Zinzow
b872fdaaac
[Datasets] Last-mile preprocessing docs. ( #20712 )
...
Datasets docs for last-mile preprocessing, particularly geared towards ML ingest. This gives groupby, aggregations, and random shuffling examples in the overview page (not present previously), adds some concreteness to our last-mile preprocessing positioning, and provides some preprocessing recipes for a few common transformations.
2021-11-29 23:23:27 -08:00
Richard Liaw
cf357f6bce
[docs] Add a talks section for ray.data ( #20444 )
2021-11-16 14:30:08 -08:00
Eric Liang
6102912494
Dataset doc updates ( #19815 )
2021-11-04 18:13:40 -07:00
Philipp Moritz
0a5942d8b0
[Documentation] Fix quotes for windows installations ( #19859 )
...
* [Documentation] Fix quotes for windows installations
* update
* formatting
2021-10-29 10:54:38 -07:00
Eric Liang
27a5b546ad
Make ArrowRow less scary ( #19686 )
2021-10-25 12:18:42 -07:00
Eric Liang
875d19f838
[data] Fix inconsistent naming of to_refs() methods, remove to_arrow() ( #19620 )
2021-10-23 12:20:23 -07:00
matthewdeng
4674c78050
[Train] Rename Ray SGD v2 to Ray Train ( #19436 )
2021-10-18 22:27:46 -07:00
Eric Liang
430a5f4a21
[doc] Bump dataset to beta for 1.8 and add backlink to SGD ( #19332 )
2021-10-12 18:32:29 -07:00
Clark Zinzow
d22f838795
[Datasets] Delineate between ref and raw APIs for the Pandas/Arrow integrations. ( #18992 )
2021-10-01 13:08:25 -07:00
Alex Wu
5709c6501b
[dataset][usability] Dataset dependencies ( #18346 )
2021-09-29 17:29:31 -07:00
Eric Liang
caf34a452c
Unify ArrowTensorType tables and Tensor blocks ( #18867 )
2021-09-27 16:24:09 -07:00
Eric Liang
4d2065352b
Increase dataset read parallelism by default ( #18420 )
2021-09-09 15:07:49 -07:00
Clark Zinzow
b30c41759d
[Datasets] Adds tensor column support (tensors-in-tables) via Pandas/Arrow extension types/arrays. ( #18301 )
2021-09-08 10:09:01 -07:00
Eric Liang
cbdafa0b63
[doc] Fix various workflow doc bugs ( #18357 )
2021-09-06 01:39:08 -07:00
Eric Liang
7dcae690b9
Mark datasets as still in alpha for now ( #18321 )
2021-09-02 17:07:33 -07:00
Wesley Gifford
6133a561e9
Dataset from modin ( #18122 )
2021-08-31 11:19:35 -07:00
Eric Liang
95b5ad12ba
Initial version of workflow documentation ( #18138 )
2021-08-27 16:20:48 -07:00
Clark Zinzow
aee7ba2510
[Datasets] Add from_numpy() and to_numpy() APIs ( #18146 )
2021-08-27 13:33:11 -07:00
Eric Liang
71b3183038
Add implicit init note to Ray docs & dataset version note ( #17751 )
2021-08-11 13:13:22 -07:00
Eric Liang
d4f9d3620e
Move ray.data out of experimental ( #17560 )
2021-08-04 13:31:10 -07:00
Eric Liang
748cbbb23d
[hotfix] Parquet S3 reads broken due to pyarrow.lib.ArrowInvalid: S3 subsystem not initialized ( #17492 )
2021-08-02 11:48:48 -07:00
Eric Liang
e812691909
Support top-level tensor values in dataset ( #17439 )
2021-08-01 22:45:21 -07:00
Eric Liang
7ed62ea0ad
Initial implementation of Dataset pipelining and docs ( #17309 )
2021-07-28 21:12:01 -07:00
Clark Zinzow
b5194ca9f9
Add imports to docs examples to make the code more runnable. ( #17240 )
2021-07-21 11:18:45 -07:00
Eric Liang
fabba96fad
Re-merge large function def, skipping test failing on Windows ( #17191 )
2021-07-19 18:03:26 -07:00
architkulkarni
4069686e0f
Revert "Improve error message for oversized function ( #17133 )" ( #17184 )
...
This reverts commit 3e53619d64
.
2021-07-19 09:28:33 -07:00
Eric Liang
3e53619d64
Improve error message for oversized function ( #17133 )
2021-07-17 11:04:05 -07:00
Eric Liang
94f17ec099
[RFC] API stability annotations ( #17100 )
2021-07-16 17:09:20 -07:00
Eric Liang
26a286655b
Add link to datasets preview docs
2021-07-16 12:31:52 -07:00
Eric Liang
f03b43c532
[dataset] Support callable classes to simplify state initialization ( #17136 )
2021-07-15 23:06:14 -07:00
Eric Liang
3d764d7b4b
[data] Fix the ObjectRef type in the dataset docs ( #17111 )
...
* fix reft
* remove exp
* fix
2021-07-15 09:50:37 -07:00
Eric Liang
38bddc3f2b
First cut at dataset documentation ( #16956 )
2021-07-14 23:27:13 -07:00