This PR overhauls the "Accessing Datasets", adding proper coverage of each data consuming methods, including the ML framework exchange APIs (to_torch() and to_tf()).
This PR knocks off a few miscellaneous GA docs P0s given in our docs tracker. Namely:
- Documents Datasets resource allocation model.
- De-emphasizes global/windowed shuffling.
- Documents lazy execution mode, and expands our execution model docs in general.
This is part of the Dataset GA doc fix effort to update/improve the documentation.
This PR revamps the Getting Started page.
What are the changes:
- Focus on basic/core features that are bread-and-butter for users, leave the advanced features out
- Focus on high level introduction, leave the detailed spec out (e.g. what are possible batch_types for map_batches() API)
- Use more realistic (yet still simple) data example that's familiar to people (IRIS dataset in this case)
- Use the same data example throughout to make it context-switch free
- Use runnable code rather than faked
- Reference to the code from doc, instead of inlining them in the doc
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-136.us-west-2.compute.internal>
Co-authored-by: Eric Liang <ekhliang@gmail.com>
This PR is a general overhaul of the "Creating Datasets" feature guide, providing complete coverage of all (public) dataset creation APIs and highlighting features and quirks of the individual APIs, data modalities, storage backends, etc. In order to keep the page from getting too long and keeping it easy to navigate, tabbed views are used heavily.