ray/doc/source/pandas_on_ray.rst
2018-03-13 22:23:50 -07:00

71 lines
2.9 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Pandas on Ray
=============
Pandas on Ray is an early stage DataFrame library that wraps Pandas and
transparently distributes the data and computation. The user does not need to
know how many cores their system has, nor do they need to specify how to
distribute the data. In fact, users can continue using their previous Pandas
notebooks while experiencing a considerable speedup from Pandas on Ray, even
on a single machine. Only a modification of the import statement is needed, as
we demonstrate below. Once youve changed your import statement, youre ready
to use Pandas on Ray just like you would Pandas.
.. code-block:: python
# import pandas as pd
import ray.dataframe as pd
Currently, we have part of the Pandas API implemented and are working toward
full functional parity with Pandas.
Using Pandas on Ray on a Single Node
------------------------------------
In order to use the most up-to-date version of Pandas on Ray, please follow
the instructions on the `installation page`_
Once you import the library, you should see something similar to the following
output:
.. code-block:: text
>>> import ray.dataframe as pd
Waiting for redis server at 127.0.0.1:14618 to respond...
Waiting for redis server at 127.0.0.1:31410 to respond...
Starting local scheduler with the following resources: {'CPU': 4, 'GPU': 0}.
======================================================================
View the web UI at http://localhost:8889/notebooks/ray_ui36796.ipynb?token=ac25867d62c4ae87941bc5a0ecd5f517dbf80bd8e9b04218
======================================================================
If you do not see output similar to the above, please make sure that you have
built Ray using the instructions on the `installation page`_
One you have executed ``import ray.dataframe as pd``, you're ready to begin
running your Pandas pipeline as you were before. Please note, the API is not
yet complete. For some methods, you may see the following:
.. code-block:: text
NotImplementedError: To contribute to Pandas on Ray, please visit github.com/ray-project/ray.
If you would like to request a particular method be implemented, feel free to
`open an issue`_. Before you open an issue please make sure that someone else
has not already requested that functionality.
Using Pandas on Ray on a Cluster
--------------------------------
Currently, we do not yet support running Pandas on Ray on a cluster. Coming
Soon!
Examples
--------
You can find an example on our recent `blog post`_ or on the
`Jupyter Notebook`_ that we used to create the blog post.
.. _`installation page`: http://ray.readthedocs.io/en/latest/installation.html
.. _`open an issue`: http://github.com/ray-project/ray/issues
.. _`blog post`: http://rise.cs.berkeley.edu/blog/pandas-on-ray
.. _`Jupyter Notebook`: http://gist.github.com/devin-petersohn/f424d9fb5579a96507c709a36d487f24#file-pandas_on_ray_blog_post_0-ipynb