2020-09-14 12:11:05 -07:00
|
|
|
.. _mars-on-ray:
|
|
|
|
|
2022-01-27 22:14:36 +01:00
|
|
|
Using Mars on Ray
|
|
|
|
=================
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
.. _`issue on GitHub`: https://github.com/mars-project/mars/issues
|
|
|
|
|
|
|
|
|
2020-09-14 21:18:28 -07:00
|
|
|
`Mars`_ is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.
|
2022-04-30 00:43:52 +08:00
|
|
|
Mars on Ray makes it easy to scale your programs with a Ray cluster. Currently Mars on Ray supports both Ray actors
|
|
|
|
and tasks as execution backend. The task will be scheduled by mars scheduler if Ray actors is used. This mode can reuse
|
|
|
|
all mars shceduler optimizations. If ray tasks mode is used, all tasks will be scheduled by ray, which can reuse failover and
|
|
|
|
pipeline capabilities provided by ray futures.
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
|
|
|
.. _`Mars`: https://docs.pymars.org
|
|
|
|
|
|
|
|
|
|
|
|
Installation
|
|
|
|
-------------
|
|
|
|
You can simply install Mars via pip:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
pip install pymars>=0.8.3
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
|
|
|
Getting started
|
|
|
|
----------------
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
It's easy to run Mars jobs on a Ray cluster.
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
Starting a new Mars on Ray runtime locally via:
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
import ray
|
|
|
|
ray.init()
|
|
|
|
import mars
|
|
|
|
mars.new_ray_session()
|
2020-09-04 00:07:33 +08:00
|
|
|
import mars.tensor as mt
|
2022-04-30 00:43:52 +08:00
|
|
|
mt.random.RandomState(0).rand(1000_0000, 5).sum().execute()
|
|
|
|
|
2020-09-04 00:07:33 +08:00
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
Or connecting to a Mars on Ray runtime which is already initialized:
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
import mars
|
|
|
|
mars.new_ray_session('http://<web_ip>:<ui_port>')
|
|
|
|
# perform computation
|
2020-09-04 00:07:33 +08:00
|
|
|
|
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
Interact with Ray Dataset:
|
2020-09-04 00:07:33 +08:00
|
|
|
|
2022-04-30 00:43:52 +08:00
|
|
|
|
|
|
|
.. code-block:: python
|
|
|
|
|
|
|
|
import mars.tensor as mt
|
|
|
|
import mars.dataframe as md
|
|
|
|
df = md.DataFrame(
|
|
|
|
mt.random.rand(1000_0000, 4),
|
|
|
|
columns=list('abcd'))
|
|
|
|
# Convert mars dataframe to ray dataset
|
|
|
|
import ray
|
|
|
|
# ds = md.to_ray_dataset(df)
|
|
|
|
ds = ray.data.from_mars(df)
|
|
|
|
print(ds.schema(), ds.count())
|
|
|
|
ds.filter(lambda row: row["a"] > 0.5).show(5)
|
|
|
|
# Convert ray dataset to mars dataframe
|
|
|
|
# df2 = md.read_ray_dataset(ds)
|
|
|
|
df2 = ds.to_mars()
|
|
|
|
print(df2.head(5).execute())
|
|
|
|
|
|
|
|
Refer to _`Mars on Ray`: https://docs.pymars.org/en/latest/installation/ray.html for more information.
|