Commit graph

14 commits

Author SHA1 Message Date
Eric Liang
400330e9c0
[air] Add _max_cpu_fraction_per_node to ScalingConfig and documentation (#26634) 2022-07-16 21:55:51 -07:00
Cheng Su
4e674b6ad3
[Datasets] Update docs for drop_columns and fix typos (#26317)
We added drop_columns() API to datasets in #26200, so updating documentation here to use the new API - doc/source/data/examples/nyc_taxi_basic_processing.ipynb. In addition, fixing some minor typos after proofreading the datasets documentation.
2022-07-07 17:17:33 -07:00
Clark Zinzow
1701b923bc
[Datasets] [Tensor Story - 2/2] Add "numpy" batch format for batch mapping and batch consumption. (#24870)
This PR adds a NumPy "numpy" batch format for batch transformations and batch consumption that works with all block types. See #24811.
2022-06-17 16:01:02 -07:00
Jian Xiao
50c854b1ad
Fix hyperlink in rst doc (#25427)
Hyperlink not working

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
2022-06-08 13:46:23 -07:00
Jian Xiao
6589a4f8cb
[Datasets][UX Assessment] Add a section on how to write UDFs in Datasets (#25338)
The Datasets UX assessment showed that users had difficulties in writing UDFs: what's input/output types, how to write the function etc.

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
2022-06-02 20:00:50 -07:00
Eric Liang
51b295ad74
[docs] Improve Tune + Datasets documentation (#25389) 2022-06-01 21:52:32 -07:00
Jian Xiao
ad842ec9ab
Revamp the Transforming Datasets user guide (#25033) 2022-05-20 19:25:06 -07:00
Jian Xiao
e5838c4700
Fix range_arrow(), which is replaced by range_table() (#25036) 2022-05-20 19:24:49 -07:00
Jian Xiao
44fd7fd1d0
Revamp the Saving Datasets user guide (#24987) 2022-05-19 15:40:12 -07:00
Clark Zinzow
399334d53c
[Datasets] Overhaul "Accessing Datasets" feature guide. (#24963)
This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()).
2022-05-19 12:50:00 -07:00
Clark Zinzow
0b6505e8c6
[Datasets] Miscellaneous GA docs P0s. (#24891)
This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely:

- Documents Datasets resource allocation model.
- De-emphasizes global/windowed shuffling.
- Documents lazy execution mode, and expands our execution model docs in general.
2022-05-18 16:17:48 -07:00
Jian Xiao
9fe4dba4ad
Revamp the Getting Started page for Dataset (#24860)
This is part of the Dataset GA doc fix effort to update/improve the documentation.
This PR revamps the Getting Started page.

What are the changes:
- Focus on basic/core features that are bread-and-butter for users, leave the advanced features out
- Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API)
- Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case)
- Use the same data example throughout to make it context-switch free
- Use runnable code rather than faked
- Reference to the code from doc, instead of inlining them in the doc

Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
2022-05-18 13:46:23 -07:00
Clark Zinzow
4444150c29
[Datasets] Overhaul of "Creating Datasets" feature guide. (#24831)
This PR is a general overhaul of the "Creating Datasets" feature guide, providing complete coverage of all (public) dataset creation APIs and highlighting features and quirks of the individual APIs, data modalities, storage backends, etc. In order to keep the page from getting too long and keeping it easy to navigate, tabbed views are used heavily.
2022-05-17 16:23:42 -07:00
Max Pumperla
29d94a2211
[docs] sphinx gallery removal, migrate to ipynb (#22467) 2022-02-19 01:19:07 -08:00