2022-03-18 11:25:43 -07:00
|
|
|
.. _saving_datasets:
|
|
|
|
|
|
|
|
===============
|
|
|
|
Saving Datasets
|
|
|
|
===============
|
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
Datasets can be written to local or remote storage in the desired data format.
|
|
|
|
The supported formats include Parquet, CSV, JSON, NumPy. To control the number
|
2022-05-20 19:25:06 -07:00
|
|
|
of output files, you may use :meth:`ds.repartition() <ray.data.Dataset.repartition>`
|
2022-05-19 15:40:12 -07:00
|
|
|
to repartition the Dataset before writing out.
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. tabbed:: Parquet
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. literalinclude:: ./doc_code/saving_datasets.py
|
|
|
|
:language: python
|
|
|
|
:start-after: __write_parquet_begin__
|
|
|
|
:end-before: __write_parquet_end__
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. tabbed:: CSV
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. literalinclude:: ./doc_code/saving_datasets.py
|
|
|
|
:language: python
|
|
|
|
:start-after: __write_csv_begin__
|
|
|
|
:end-before: __write_csv_end__
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. tabbed:: JSON
|
2022-03-18 11:25:43 -07:00
|
|
|
|
2022-05-19 15:40:12 -07:00
|
|
|
.. literalinclude:: ./doc_code/saving_datasets.py
|
|
|
|
:language: python
|
|
|
|
:start-after: __write_json_begin__
|
|
|
|
:end-before: __write_json_end__
|
|
|
|
|
|
|
|
.. tabbed:: NumPy
|
|
|
|
|
|
|
|
.. literalinclude:: ./doc_code/saving_datasets.py
|
|
|
|
:language: python
|
|
|
|
:start-after: __write_numpy_begin__
|
|
|
|
:end-before: __write_numpy_end__
|