(kuberay-config)=

# RayCluster Configuration

This guide covers the key aspects of Ray cluster configuration on Kubernetes.

## Introduction

Deployments of Ray on Kubernetes follow the [operator pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). The key players are
- A [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
    called a `RayCluster` describing the desired state of a Ray cluster.
- A [custom controller](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-controllers),
    the KubeRay operator, which manages Ray pods in order to match the `RayCluster`'s spec.

To deploy a Ray cluster, one creates a `RayCluster` custom resource (CR):
```shell
kubectl apply -f raycluster.yaml
```

This guide covers the salient features of `RayCluster` CR configuration.

For reference, here is a condensed example of a `RayCluster` CR in yaml format.
```yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  name: raycluster-complete
spec:
  rayVersion: "2.0.0"
  enableInTreeAutoscaling: True
  autoscalerOptions:
     ...
  headGroupSpec:
    rayStartParams:
      block: True
      dashboard-host: "0.0.0.0"
      ...
    template: # Pod template
        metadata: # Pod metadata
        spec: # Pod spec
            containers:
            - name: ray-head
              image: rayproject/ray-ml:2.0.0
              resources:
                limits:
                  cpu: 14
                  memory: 54Gi
                requests:
                  cpu: 14
                  memory: 54Gi
              ports:
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
                ...
  workerGroupSpecs:
  - groupName: small-group
    replicas: 1
    minReplicas: 1
    maxReplicas: 5
    rayStartParams:
        ...
    template: # Pod template
        ...
  # Another workerGroup
  - groupName: medium-group
    ...
  # Yet another workerGroup, with access to special hardware perhaps.
  - groupName: gpu-group
    ...
```

The rest of this guide will discuss the `RayCluster` CR's config fields.

## The Ray version
The field `rayVersion` specifies the version of Ray used in the Ray cluster.
The `rayVersion` is used to fill default values for certain config fields.
The Ray container images specified in the RayCluster CR should carry
the same Ray version as the CR's `rayVersion`. If you are using a nightly or development
Ray image, it is fine to set `rayVersion` to the latest release version of Ray.

## Pod configuration: headGroupSpec and workerGroupSpecs

At a high level, a RayCluster is a collection of Kubernetes pods, similar to a Kubernetes Deployment or StatefulSet.
Just as with the Kubernetes built-ins, the key pieces of configuration are
* Pod specification
* Scale information (how many pods are desired)

The key difference between a Deployment and a `RayCluster` is that a `RayCluster` is
specialized for running Ray applications. A Ray cluster consists of

* One **head pod** which hosts global control processes for the Ray cluster.
  The head pod can also run Ray tasks and actors.
* Any number of **worker pods**, which run Ray tasks and actors.
  Workers come in **worker groups** of identically configured pods.
  For each worker group, we must specify **replicas**, the number of
  pods we want of that group.

The head pod’s configuration is
specified under `headGroupSpec`, while configuration for worker pods is
specified under `workerGroupSpecs`. There may be multiple worker groups,
each group with its own configuration. The `replicas` field
of a `workerGroupSpec` specifies the number of worker pods of that group to
keep in the cluster.

### Pod templates
The bulk of the configuration for a `headGroupSpec` or
`workerGroupSpec` goes in the `template` field. The `template` is a Kubernetes Pod
template which determines the configuration for the pods in the group.
Here are some of the subfields of the pod `template` to pay attention to:

#### ports
Under `headGroupSpec`, the Ray head container should list the ports for the services it exposes.
```yaml
ports:
- containerPort: 6379
name: gcs
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
```
The KubeRay operator will configure a Kubernetes Service exposing these ports.
The name of the configured Kubernetes Service is the name, `metadata.name`, of the RayCluster
followed by the suffix\
`-head-svc`. For the example CR given on this page, the name of
the head service will be\
`raycluster-example-head-svc`. Kubernetes networking (`kube-dns`) then allows us to address
the Ray head's services using the name `raycluster-example-head-svc`.
For example, the Ray Client server can be accessed from a pod
in the same Kubernetes namespace using
```python
ray.init("ray://raycluster-example-head-svc:10001")
```
The Ray Client server can be accessed from a pod in another namespace using
```python
ray.init("ray://raycluster-example-head-svc.default.svc.cluster.local:10001")
```
(This assumes the Ray cluster was deployed into the default Kuberentes namespace.
If the Ray cluster is deployed in a non-default namespace, use that namespace in
place of `default`.)
Ray Client and other services can be exposed outside the Kubernetes cluster
using port-forwarding or an ingress. See {ref}`this guide <kuberay-networking>` for more details.

#### resources
It’s important to specify container CPU and memory requests and limits for
each group spec. For GPU workloads, you may also wish to specify GPU
limits. For example, set `nvidia.com/gpu:2` if using an Nvidia GPU device plugin
and you wish to specify a pod with access to 2 GPUs.
See {ref}`this guide <kuberay-gpu>` for more details on GPU support.

It's ideal to size each Ray pod to take up the
entire Kubernetes node on which it is scheduled. In other words, it’s
best to run one large Ray pod per Kubernetes node.
In general, it is more efficient to use a few large Ray pods than many small ones.
The pattern of fewer large Ray pods has the following advantages:
- more efficient use of each Ray pod's shared memory object store
- reduced communication overhead between Ray pods
- reduced redundancy of per-pod Ray control structures such as Raylets

The CPU, GPU, and memory **limits** specified in the Ray container config
will be automatically advertised to Ray. These values will be used as
the logical resource capacities of Ray pods in the head or worker group.
Note that CPU quantities will be rounded up to the nearest integer
before being relayed to Ray.
The resource capacities advertised to Ray may be overridden in the {ref}`rayStartParams`.

On the other hand CPU, GPU, and memory **requests** will be ignored by Ray.
For this reason, it is best when possible to set resource requests equal to resource limits.

#### nodeSelector and tolerations
You can control the scheduling of worker groups' Ray pods by setting the `nodeSelector` and
`tolerations` fields of the pod spec. Specifically, these fields determine on which Kubernetes
nodes the pods may be scheduled.
See the [Kubernetes docs](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
for more about Pod-to-Node assignment.

#### image
The Ray container images specified in the `RayCluster` CR should carry
the same Ray version as the CR's `spec.rayVersion`.
If you are using a nightly or development Ray image, it is fine to specify Ray's
latest release version under `spec.rayVersion`.

Code dependencies for a given Ray task or actor must be installed on each Ray node that
might run the task or actor.
To achieve this, it is simplest to use the same Ray image for the Ray head and all worker groups.
In any case, do make sure that all Ray images in your CR carry the same Ray version and
Python version.
To distribute custom code dependencies across your cluster, you can build a custom container image,
using one of the [official Ray images](https://hub.docker.com/r/rayproject/ray>) as the base.
See {ref}`this guide<docker-images>` to learn more about the official Ray images.
For dynamic dependency management geared towards iteration and developement,
you can also use {ref}`Runtime Environments<runtime-environments>`.

(rayStartParams)=
## Ray Start Parameters
The ``rayStartParams`` field of each group spec is a string-string map of arguments to the Ray
container’s `ray start` entrypoint. For the full list of arguments, refer to
the documentation for {ref}`ray start<ray-start-doc>`. We make special note of the following arguments:

### block
For most use-cases, this field should be set to "true" for all Ray pod. The container's Ray
entrypoint will then block forever until a Ray process exits, at which point the container
will exit. If this field is omitted, `ray start` will start Ray processes in the background and the container
will subsequently sleep forever until terminated. (Future versions of KubeRay may set
block to true by default. See [KubeRay issue #368](https://github.com/ray-project/kuberay/issues/368).)

### dashboard-host
For most use-cases, this field should be set to "0.0.0.0" for the Ray head pod.
This is required to expose the Ray dashboard outside the Ray cluster. (Future versions might set
this parameter by default.)

### num-cpus
This optional field tells the Ray scheduler and autoscaler how many CPUs are
available to the Ray pod. The CPU count can be autodetected from the
Kubernetes resource limits specified in the group spec’s pod
`template`. However, it is sometimes useful to override this autodetected
value. For example, setting `num-cpus:"0"` for the Ray head pod will prevent Ray
workloads with non-zero CPU requirements from being scheduled on the head.
Note that the values of all Ray start parameters, including `num-cpus`,
must be supplied as **strings**.

### num-gpus
This optional field specifies the number of GPUs available to the Ray container.
In KubeRay versions since 0.3.0, the number of GPUs can be auto-detected from Ray container resource limits.
For certain advanced use-cases, you may wish to use `num-gpus` to set an {ref}`override<kuberay-gpu-override>`.
Note that the values of all Ray start parameters, including `num-gpus`,
must be supplied as **strings**.

### memory
The memory available to the Ray is detected automatically from the Kubernetes resource
limits. If you wish, you may override this autodetected value by setting the desired memory value,
in bytes, under `rayStartParams.memory`.
Note that the values of all Ray start parameters, including `memory`,
must be supplied as **strings**.

### resources
This field can be used to specify custom resource capacities for the Ray pod.
These resource capacities will be advertised to the Ray scheduler and Ray autoscaler.
For example, the following annotation will mark a Ray pod as having 1 unit of `Custom1` capacity
and 5 units of `Custom2` capacity.
```yaml
rayStartParams:
    resources: '"{\"Custom1\": 1, \"Custom2\": 5}"'
```
You can then annotate tasks and actors with annotations like `@ray.remote(resources={"Custom2": 1})`.
The Ray scheduler and autoscaler will take appropriate action to schedule such tasks.

Note the format used to express the resources string. In particular, note
that the backslashes are present as actual characters in the string.
If you are specifying a `RayCluster` programmatically, you may have to
[escape the backslashes](https://github.com/ray-project/ray/blob/cd9cabcadf1607bcda1512d647d382728055e688/python/ray/tests/kuberay/test_autoscaling_e2e.py#L92) to make sure they are processed as part of the string.

The field `rayStartParams.resources` should only be used for custom resources. The keys
`CPU`, `GPU`, and `memory` are forbidden. If you need to specify overrides for those resource
fields, use the Ray start parameters `num-cpus`, `num-gpus`, or `memory`.


(kuberay-autoscaling-config)=
## Autoscaler configuration
```{note}
If you are deciding whether to use autoscaling for a particular Ray application,
check out this {ref}`discussion<autoscaler-pro-con>`.
```
To enable the optional Ray Autoscaler support, set `enableInTreeAutoscaling:true`.
The KubeRay operator will then automatically configure an autoscaling sidecar container
for the Ray head pod. The autoscaler container collects resource metrics from the Ray cluster
and automatically adjusts the `replicas` field of each `workerGroupSpec` as needed to fulfill
the requirements of your Ray application.

Use the fields `minReplicas` and `maxReplicas` to constrain the number of `replicas` of an autoscaling
`workerGroup`. When deploying an autoscaling cluster, one typically sets `replicas` and `minReplicas`
to the same value.
The Ray autoscaler will then take over and modify the `replicas` field as needed by
the Ray application.

### Autoscaler operation
We describe how the autoscaler interacts with the `RayCluster` CR.

#### Scale up
The autoscaler scales worker pods up to accomodate the load of logical resources
from your Ray application. For example, suppose you submit a task requesting 2 GPUs:
```python
@ray.remote(num_gpus=2)
...
```
If your Ray cluster does not currently have any GPU worker pods, and if your configuration
specifies a worker type with at least 2 units of GPU capacity, a GPU pod will be
upscaled.

The autoscaler scales Ray worker pods up by editing the `replicas` field of the relevant `workerGroupSpec`.

#### Scale down
The autoscaler scales a worker pod down when the pod has not been using any logical resources
for a {ref}`set period of time<kuberay-idle-timeout>`. In this context, "resources" are the logical Ray resources
(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.

The autoscaler scales Ray worker pods down by adding the Ray pods' names to the `RayCluster` CR's
`scaleStrategy.workersToDelete` list and decrementing the `replicas` field of the relevant
`workerGroupSpec`.

#### Manually scaling
You may manually adjust a `RayCluster`'s scale by editing the `replicas` or `workersToDelete` fields.
(It is also possible to implement custom scaling logic that adjusts scale on your behalf.)
It is however, not recommended to manually edit `replicas` or `workersToDelete` for a `RayCluster` with
autoscaling enabled.

### autoscalerOptions
To enable Ray autoscaler support, it is enough to set `enableInTreeAutoscaling:true`.
Should you need to adjust autoscaling behavior or change the autoscaler container's configuration,
you can use the `RayCluster` CR's `autoscalerOptions` field. The `autoscalerOptions` field
carries the following subfields:

#### upscalingMode
The `upscalingMode` field can be used to control the rate of Ray pod upscaling.

UpscalingMode is `Conservative`, `Default`, or `Aggressive`.
- `Conservative`: Upscaling is rate-limited; the number of pending worker pods is at most the number
  of worker pods connected to the Ray cluster.
- `Default`: Upscaling is not rate-limited.
- `Aggressive`: An alias for Default; upscaling is not rate-limited.

You may wish to use `Conservative` upscaling if you plan to submit many short-lived tasks
to your Ray cluster. In this situation, `Default` upscaling may trigger the _thrashing_ behavior:
- The autoscaler sees resource demands from the submitted short-lived tasks.
- The autoscaler immediately creates Ray pods to accomodate the demand.
- By the time the additional Ray pods are provisioned, the tasks have already run to completion.
- The additional Ray pods are unused and scale down after a period of idleness.

Note, however, that it is generally not recommended to over-parallelize with Ray.
Since running a Ray task incurs scheduling overhead, it is usually preferable to use
a few long-running tasks over many short-running tasks. Ensuring that each task has
a non-trivial amount of work to do will also help prevent the autoscaler from over-provisioning
Ray pods.

(kuberay-idle-timeout)=
#### idleTimeoutSeconds
`idleTimeoutSeconds` is the number of seconds to wait before scaling down a worker pod
which is not using resources. In this context, "resources" are the logical Ray resources
(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.

`idleTimeoutSeconds` defaults to 60 seconds.

#### resources
The `resources` subfield of `autoscalerOptions` sets optional resource overrides
for the autoscaler sidecar container. These overrides
should be specified in the standard [container resource
spec format](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
The default values are as indicated below:
```
resources:
  limits:
    cpu: "500m"
    memory: "512Mi"
  requests:
    cpu: "500m"
    memory: "512Mi"
```
These defaults should be suitable for most use-cases.
However, we do recommend monitoring autoscaler container resource usage and adjusting as needed.

#### image and imagePullPolicy
The `image` subfield of `autoscalerOptions` optionally overrides the autoscaler container image.
If your `RayCluster`'s `spec.RayVersion` is at least `2.0.0`, the autoscaler will default to using
**the same image** as the Ray container. (Ray autoscaler code is bundled with the rest of Ray.)
For older Ray versions, the autoscaler will default to the image `rayproject/ray:2.0.0`.

The `imagePullPolicy` subfield of `autoscalerOptions` optionally overrides the autoscaler container's
image pull policy. The default is `Always`.

The `image` and `imagePullPolicy` overrides are provided primarily for the purposes of autoscaler testing and
development.

#### env and envFrom

The `env` and `envFrom` fields specify autoscaler container
environment variables, for debugging and development purposes.
These fields should be formatted following the
[Kuberentes API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#environment-variables)
for container environment variables.