mirror of
https://github.com/vale981/ray
synced 2025-03-05 10:01:43 -05:00
[Datasets] Add basic e2e Datasets example on NYC taxi dataset (#24874)
This PR adds a dedicated docs page for examples, and adds a basic e2e tabular data processing example on the NYC taxi dataset. The goal of this example is to demonstrate basic data reading, inspection, transformations, and shuffling, along with ingestion into dummy model trainers and doing dummy batch inference, for tabular (Parquet) data.
This commit is contained in:
parent
399334d53c
commit
6c0a457d7a
7 changed files with 1286 additions and 2 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -210,3 +210,6 @@ workflow_data/
|
|||
|
||||
# vscode java extention generated
|
||||
.factorypath
|
||||
|
||||
# Jupyter Notebooks
|
||||
**/.ipynb_checkpoints/
|
||||
|
|
|
@ -15,7 +15,12 @@ parts:
|
|||
- file: data/getting-started
|
||||
- file: data/key-concepts
|
||||
- file: data/user-guide
|
||||
- file: data/examples/big_data_ingestion
|
||||
- file: data/examples/index
|
||||
sections:
|
||||
- file: data/examples/nyc_taxi_basic_processing
|
||||
title: Processing the NYC taxi dataset
|
||||
- file: data/examples/big_data_ingestion
|
||||
title: Large-scale ML Ingest
|
||||
- file: data/package-ref
|
||||
- file: data/integrations
|
||||
|
||||
|
|
|
@ -1,5 +1,22 @@
|
|||
load("//bazel:python.bzl", "py_test_run_all_notebooks")
|
||||
|
||||
filegroup(
|
||||
name = "data_examples",
|
||||
srcs = glob(["*.ipynb"]),
|
||||
visibility = ["//doc:__subpackages__"]
|
||||
)
|
||||
)
|
||||
|
||||
# --------------------------------------------------------------------
|
||||
# Test all doc/source/data/examples notebooks.
|
||||
# --------------------------------------------------------------------
|
||||
|
||||
# big_data_ingestion.ipynb is not tested right now due to large resource requirements
|
||||
# and a need of a general overhaul.
|
||||
|
||||
py_test_run_all_notebooks(
|
||||
size = "medium",
|
||||
include = ["*.ipynb"],
|
||||
exclude = ["big_data_ingestion.ipynb"],
|
||||
data = ["//doc/source/data/examples:data_examples"],
|
||||
tags = ["exclusive", "team:ml"],
|
||||
)
|
||||
|
|
52
doc/source/data/examples/index.rst
Normal file
52
doc/source/data/examples/index.rst
Normal file
|
@ -0,0 +1,52 @@
|
|||
.. _datasets-examples-ref:
|
||||
|
||||
========
|
||||
Examples
|
||||
========
|
||||
|
||||
.. tip:: Check out the Datasets :ref:`User Guide <data_user_guide>` to learn more about
|
||||
Datasets' features in-depth.
|
||||
|
||||
.. _datasets-recipes:
|
||||
|
||||
Simple Data Processing Examples
|
||||
-------------------------------
|
||||
|
||||
Ray Datasets is a data processing engine that supports multiple data
|
||||
modalities and types. Here you will find a few end-to-end examples of some basic data
|
||||
processing with Ray Datasets on tabular data, text (coming soon!), and imagery (coming
|
||||
soon!).
|
||||
|
||||
.. panels::
|
||||
:container: container pb-4
|
||||
:column: col-md-4 px-2 py-2
|
||||
:img-top-cls: pt-5 w-75 d-block mx-auto
|
||||
|
||||
---
|
||||
:img-top: /images/taxi.png
|
||||
|
||||
+++
|
||||
.. link-button:: nyc_taxi_basic_processing
|
||||
:type: ref
|
||||
:text: Processing NYC taxi data using Ray Datasets
|
||||
:classes: btn-link btn-block stretched-link
|
||||
|
||||
Scaling Out Datasets Workloads
|
||||
------------------------------
|
||||
|
||||
These examples demonstrate using Ray Datasets on large-scale data over a multi-node Ray
|
||||
cluster.
|
||||
|
||||
.. panels::
|
||||
:container: container pb-4
|
||||
:column: col-md-4 px-2 py-2
|
||||
:img-top-cls: pt-5 w-75 d-block mx-auto
|
||||
|
||||
---
|
||||
:img-top: /images/dataset-repeat-2.svg
|
||||
|
||||
+++
|
||||
.. link-button:: big_data_ingestion
|
||||
:type: ref
|
||||
:text: Large-scale ML Ingest
|
||||
:classes: btn-link btn-block stretched-link
|
1206
doc/source/data/examples/nyc_taxi_basic_processing.ipynb
Normal file
1206
doc/source/data/examples/nyc_taxi_basic_processing.ipynb
Normal file
File diff suppressed because it is too large
Load diff
1
doc/source/images/dataset-repeat-2.svg
Normal file
1
doc/source/images/dataset-repeat-2.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 146 KiB |
BIN
doc/source/images/taxi.png
Normal file
BIN
doc/source/images/taxi.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 702 KiB |
Loading…
Add table
Reference in a new issue