mirror of
https://github.com/vale981/ray
synced 2025-03-07 02:51:39 -05:00
24 lines
716 B
ReStructuredText
24 lines
716 B
ReStructuredText
.. _saving_datasets:
|
|
|
|
===============
|
|
Saving Datasets
|
|
===============
|
|
|
|
Datasets can be written to local or remote storage using ``.write_csv()``, ``.write_json()``, and ``.write_parquet()``.
|
|
|
|
.. code-block:: python
|
|
|
|
# Write to csv files in /tmp/output.
|
|
ray.data.range(10000).write_csv("/tmp/output")
|
|
# -> /tmp/output/data0.csv, /tmp/output/data1.csv, ...
|
|
|
|
# Use repartition to control the number of output files:
|
|
ray.data.range(10000).repartition(1).write_csv("/tmp/output2")
|
|
# -> /tmp/output2/data0.csv
|
|
|
|
You can also convert a ``Dataset`` to Ray-compatible distributed DataFrames:
|
|
|
|
.. code-block:: python
|
|
|
|
# Convert a Ray Dataset into a Dask-on-Ray DataFrame.
|
|
dask_df = ds.to_dask()
|