ray/doc/source/data/saving-datasets.rst
2022-03-18 11:25:43 -07:00

24 lines
716 B
ReStructuredText

.. _saving_datasets:
===============
Saving Datasets
===============
Datasets can be written to local or remote storage using ``.write_csv()``, ``.write_json()``, and ``.write_parquet()``.
.. code-block:: python
# Write to csv files in /tmp/output.
ray.data.range(10000).write_csv("/tmp/output")
# -> /tmp/output/data0.csv, /tmp/output/data1.csv, ...
# Use repartition to control the number of output files:
ray.data.range(10000).repartition(1).write_csv("/tmp/output2")
# -> /tmp/output2/data0.csv
You can also convert a ``Dataset`` to Ray-compatible distributed DataFrames:
.. code-block:: python
# Convert a Ray Dataset into a Dask-on-Ray DataFrame.
dask_df = ds.to_dask()