ray/doc/source/cluster/key-concepts.rst

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

108 lines
3.6 KiB
ReStructuredText
Raw Normal View History

.. include:: we_are_hiring.rst
.. _cluster-key-concepts:
Key Concepts
============
Cluster
-------
A Ray cluster is a set of one or more nodes that are running Ray and share the
same :ref:`head node<cluster-node-types>`.
.. _cluster-node-types:
Node types
----------
A Ray cluster consists of a :ref:`head node<cluster-head-node>` and a set of
:ref:`worker nodes<cluster-worker-node>`.
.. image:: ray-cluster.jpg
:align: center
:width: 600px
.. _cluster-head-node:
Head node
~~~~~~~~~
The head node is the first node started by the
:ref:`Ray cluster launcher<cluster-launcher>` when trying to launch a Ray
cluster. Among other things, the head node holds the :ref:`Global Control Store
(GCS)<memory>` and runs the :ref:`autoscaler<cluster-autoscaler>`. Once the head
node is started, it will be responsible for launching any additional
:ref:`worker nodes<cluster-worker-node>`. The head node itself will also execute
tasks and actors to utilize its capacity.
.. _cluster-worker-node:
Worker node
~~~~~~~~~~~
A worker node is any node in the Ray cluster that is not functioning as head node.
Therefore, worker nodes are simply responsible for executing tasks and actors.
When a worker node is launched, it will be given the address of the head node to
form a cluster.
.. _cluster-launcher:
Cluster launcher
----------------
The cluster launcher is a process responsible for bootstrapping the Ray cluster
by launching the :ref:`head node<cluster-head-node>`. For more information on how
to use the cluster launcher, refer to
:ref:`cluster launcher CLI commands documentation<cluster-commands>` and the
corresponding :ref:`documentation for the configuration file<cluster-config>`.
.. _cluster-autoscaler:
Autoscaler
----------
The autoscaler is a process that runs on the :ref:`head node<cluster-head-node>`
and is responsible for adding or removing :ref:`worker nodes<cluster-worker-node>`
to meet the needs of the Ray workload while matching the specification in the
:ref:`cluster config file<cluster-config>`. In particular, if the resource
demands of the Ray workload exceed the current capacity of the cluster, the
autoscaler will try to add nodes. Conversely, if a node is idle for long enough,
the autoscaler will remove it from the cluster. To learn more about autoscaling,
refer to the :ref:`Ray cluster deployment guide<deployment-guide-autoscaler>`.
Ray Client
----------
The Ray Client is an API that connects a Python script to a remote Ray cluster.
To learn more about the Ray Client, you can refer to the :ref:`documentation<ray-client>`.
Job submission
--------------
Ray Job submission is a mechanism to submit locally developed and tested applications
to a remote Ray cluster. It simplifies the experience of packaging, deploying,
and managing a Ray application. To learn more about Ray jobs, refer to the
:ref:`documentation<ray-job-submission-api-ref>`.
Cloud clusters
--------------
If youre using AWS, Azure, GCP or Aliyun, you can use the
:ref:`Ray cluster launcher<cluster-launcher>` to launch cloud clusters, which
greatly simplifies the cluster setup process.
Cluster managers
----------------
You can simplify the process of managing Ray clusters using a number of popular
cluster managers including :ref:`Kubernetes<ray-k8s-deploy>`,
:ref:`YARN<ray-yarn-deploy>`, :ref:`Slurm<ray-slurm-deploy>` and :ref:`LSF<ray-LSF-deploy>`.
Kubernetes (K8s) operator
-------------------------
Deployments of Ray on Kubernetes are managed by the Ray Kubernetes Operator. The
Ray Operator makes it easy to deploy clusters of Ray pods within a Kubernetes
cluster. To learn more about the K8s operator, refer to
the :ref:`documentation<ray-operator>`.