mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00

Refactor Datasets API docs for easier navigation: [Ray Datasets API](https://ray--27592.org.readthedocs.build/en/27592/data/api/api.html) ### Changes 1. Create a new Datasets API base page. 2. Split existing APIs into separate pages. 3. Split `Dataset` and `DatasetPipeline` methods into separate sections. 1. Used `autosummary` to generate overview tables at the top of each of these pages. Open to other suggestions e.g. moving the summary to the top of each section instead. 2. **Note:** Every time we add a new method we need to explicitly add it here as well. 4. Add Input/Output APIs. 1. I chose to split these primarily by data format rather than type, since it's easier to navigate, and the existing [Creating Datasets](https://docs.ray.io/en/master/data/creating-datasets.html) User Guide already does the latter. 6. Add `Block` and `DataBatch` (should we add these aliases?) 7. Remove existing `package-ref`.
177 lines
No EOL
4.2 KiB
ReStructuredText
177 lines
No EOL
4.2 KiB
ReStructuredText
.. _dataset-pipeline-api:
|
|
|
|
DatasetPipeline API
|
|
===================
|
|
|
|
.. autoclass:: ray.data.dataset_pipeline.DatasetPipeline
|
|
|
|
**Basic Transformations**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.map
|
|
ray.data.DatasetPipeline.map_batches
|
|
ray.data.DatasetPipeline.flat_map
|
|
ray.data.DatasetPipeline.foreach_window
|
|
ray.data.DatasetPipeline.filter
|
|
ray.data.DatasetPipeline.add_column
|
|
ray.data.DatasetPipeline.drop_columns
|
|
|
|
**Sorting, Shuffling, Repartitioning**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.sort_each_window
|
|
ray.data.DatasetPipeline.random_shuffle_each_window
|
|
ray.data.DatasetPipeline.randomize_block_order_each_window
|
|
ray.data.DatasetPipeline.repartition_each_window
|
|
|
|
**Splitting DatasetPipelines**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.split
|
|
ray.data.DatasetPipeline.split_at_indices
|
|
|
|
**Creating DatasetPipelines**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.repeat
|
|
ray.data.DatasetPipeline.rewindow
|
|
ray.data.DatasetPipeline.from_iterable
|
|
|
|
**Consuming DatasetPipelines**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.show
|
|
ray.data.DatasetPipeline.show_windows
|
|
ray.data.DatasetPipeline.take
|
|
ray.data.DatasetPipeline.take_all
|
|
ray.data.DatasetPipeline.iter_rows
|
|
ray.data.DatasetPipeline.iter_batches
|
|
ray.data.DatasetPipeline.iter_torch_batches
|
|
ray.data.DatasetPipeline.iter_tf_batches
|
|
|
|
**I/O and Conversion**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.write_json
|
|
ray.data.DatasetPipeline.write_csv
|
|
ray.data.DatasetPipeline.write_parquet
|
|
ray.data.DatasetPipeline.write_datasource
|
|
ray.data.DatasetPipeline.to_tf
|
|
ray.data.DatasetPipeline.to_torch
|
|
|
|
**Inspecting Metadata**
|
|
|
|
.. autosummary::
|
|
:nosignatures:
|
|
|
|
ray.data.DatasetPipeline.schema
|
|
ray.data.DatasetPipeline.count
|
|
ray.data.DatasetPipeline.stats
|
|
ray.data.DatasetPipeline.sum
|
|
|
|
Basic transformations
|
|
---------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.map
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.map_batches
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.flat_map
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.foreach_window
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.filter
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.add_column
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.drop_columns
|
|
|
|
Sorting, Shuffling, Repartitioning
|
|
----------------------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.sort_each_window
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.random_shuffle_each_window
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.randomize_block_order_each_window
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.repartition_each_window
|
|
|
|
Splitting DatasetPipelines
|
|
--------------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.split
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.split_at_indices
|
|
|
|
Creating DatasetPipelines
|
|
-------------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.repeat
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.rewindow
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.from_iterable
|
|
|
|
Consuming DatasetPipelines
|
|
--------------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.show
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.show_windows
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.take
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.take_all
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_rows
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_batches
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_epochs
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_tf_batches
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_torch_batches
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.iter_datasets
|
|
|
|
|
|
I/O and Conversion
|
|
------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.write_json
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.write_csv
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.write_parquet
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.write_datasource
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.to_tf
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.to_torch
|
|
|
|
|
|
Inspecting Metadata
|
|
-------------------
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.schema
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.count
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.stats
|
|
|
|
.. automethod:: ray.data.DatasetPipeline.sum |