of the ``podType``. The ``image`` field determines the Ray container image used by all nodes in the Ray cluster.
The ``rayResources`` field of each ``podType`` can be used to signal the presence of custom resources to Ray.
To schedule Ray tasks and actors that use custom hardware resources, ``rayResources`` can be used in conjunction with
``nodeSelector``:
- Use ``nodeSelector`` to constrain workers of a ``podType`` to run on a Kubernetes Node with specialized hardware (e.g. a particular GPU accelerator.)
- Signal availability of the hardware for that ``podType`` with ``rayResources: {"custom_resource": 3}``.
- Schedule a Ray task or actor to use that resource with ``@ray.remote(resources={"custom_resource": 1})``.
Refer to the documentation in `values.yaml`_ for more details.
..note::
If your application could benefit from additional configuration options in the Ray Helm chart,
(e.g. exposing more PodSpec fields), feel free to open a `feature request`_ on
the Ray GitHub or a `discussion thread`_ on the Ray forums.
For complete configurability, it is also possible launch a Ray cluster :ref:`without the Helm chart<no-helm>`
or to modify the Helm chart.
..note::
Some things to keep in mind about the scheduling of Ray worker pods and Ray tasks/actors:
1. The Ray Autoscaler executes scaling decisions by sending pod creation requests to the Kubernetes API server.
If your Kubernetes cluster cannot accomodate more worker pods of a given ``podType``, requested pods will enter
a ``Pending`` state until the pod can be scheduled or a `timeout`_ expires.
2. If a Ray task requests more resources than available in any ``podType``, the Ray task cannot be scheduled.
Running multiple Ray clusters
-----------------------------
The Ray Operator can manage multiple Ray clusters running within a single Kubernetes cluster.
Since Helm does not support sharing resources between different releases, an additional Ray cluster
must be launched in a Helm release separate from the release used to launch the Operator.
To enable launching with multiple Ray Clusters, the Ray Helm chart includes two flags:
-``operatorOnly``: Start the Operator without launching a Ray cluster.
-``clusterOnly``: Create a RayCluster custom resource without installing the Operator. \(If the Operator has already been installed, a new Ray cluster will be launched.)
The following commands will install the Operator and two Ray Clusters in
By default, the Ray Helm chart installs a ``cluster-scoped`` operator.
This means that the operator manages all Ray clusters in your Kubernetes cluster, across all namespaces.
The namespace into which the Operator Deployment is launched is determined by the chart field ``operatorNamespace``.
If this field is unset, the operator is launched into namespace ``default``.
It is also possible to run a ``namespace-scoped`` Operator.
This means that the Operator is launched into the namespace of the Helm release and manages only
Ray clusters in that namespace. To run a namespaced Operator, add the flag ``--set namespacedOperator=True``
to your Helm install command.
..warning::
Do not simultaneously run namespaced and cluster-scoped Ray Operators within one Kubernetes cluster, as this will lead to unintended effects.
.._no-helm:
Deploying without Helm
----------------------
It is possible to deploy the Ray Operator without Helm.
The necessary configuration files are available on the `Ray GitHub`_.
The following manifests should be installed in the order listed:
- The `RayCluster CRD`_.
- The Ray Operator, `namespaced`_ or `cluster-scoped`_.\Note that the cluster-scoped operator is configured to run in namespaced ``default``. Modify as needed.
- A RayCluster custom resource: `example`_.
Ray Cluster Lifecycle
---------------------
.._k8s-restarts:
Restart behavior
~~~~~~~~~~~~~~~~
The Ray cluster will restart under the following circumstances:
- There is an error in the cluster's autoscaling process. This will happen if the Ray head node goes down.
- There has been a change to the Ray head pod configuration. In terms of the Ray Helm chart, this means either ``image`` or one of the following fields of the head's ``podType`` has been modified: ``CPU``, ``GPU``, ``memory``, ``nodeSelector``.
Running ``kubectl -n <namespace> get raycluster`` will show all Ray clusters in the namespace with status information.
..code-block:: shell
kubectl -n ray get rayclusters
NAME STATUS RESTARTS AGE
example-cluster Running 0 9s
The ``STATUS`` column reports the RayCluster's ``status.phase`` field. The following values are possible:
-``Empty/nil``: This means the RayCluster resource has not yet been registered by the Operator.
-``Updating``: The Operator is launching the Ray cluster or processing an update to the cluster's configuration.
-``Running``: The Ray cluster's autoscaling process is running in a normal state.
-``AutoscalingExceptionRecovery`` The Ray cluster's autoscaling process has crashed. Ray processes will restart. This can happen if the Ray head node goes down.
-``Error`` There was an unexpected error while updating the Ray cluster. (The Ray maintainers would be grateful if you file a `bug report`_ with operator logs.)
The ``RESTARTS`` column reports the RayCluster's ``status.autoscalerRetries`` field. This tracks the number of times the cluster has restarted due to an autoscaling error.