mirror of
https://github.com/vale981/ray
synced 2025-03-08 19:41:38 -05:00
25 lines
716 B
ReStructuredText
25 lines
716 B
ReStructuredText
![]() |
.. _saving_datasets:
|
||
|
|
||
|
===============
|
||
|
Saving Datasets
|
||
|
===============
|
||
|
|
||
|
Datasets can be written to local or remote storage using ``.write_csv()``, ``.write_json()``, and ``.write_parquet()``.
|
||
|
|
||
|
.. code-block:: python
|
||
|
|
||
|
# Write to csv files in /tmp/output.
|
||
|
ray.data.range(10000).write_csv("/tmp/output")
|
||
|
# -> /tmp/output/data0.csv, /tmp/output/data1.csv, ...
|
||
|
|
||
|
# Use repartition to control the number of output files:
|
||
|
ray.data.range(10000).repartition(1).write_csv("/tmp/output2")
|
||
|
# -> /tmp/output2/data0.csv
|
||
|
|
||
|
You can also convert a ``Dataset`` to Ray-compatible distributed DataFrames:
|
||
|
|
||
|
.. code-block:: python
|
||
|
|
||
|
# Convert a Ray Dataset into a Dask-on-Ray DataFrame.
|
||
|
dask_df = ds.to_dask()
|