For specialized use-cases, it is possible to override the Ray pod GPU capacities advertised to Ray.
To do so, set a value for the `num-gpus` key of the head or worker group's `rayStartParams`.
For example,
..code-block:: yaml
rayStartParams:
# Note that all rayStartParam values must be supplied as strings.
num-gpus: "2"
The Ray scheduler and autoscaler will then account 2 units of GPU capacity for each
Ray pod in the group, even if the container limits do not indicate the presence of GPU.
GPU pod scheduling (advanced)
_____________________________
GPU taints and tolerations
~~~~~~~~~~~~~~~~~~~~~~~~~~
..note::
Managed Kubernetes services typically take care of GPU-related taints and tolerations
for you. If you are using a managed Kubernetes service, you might not need to worry
about this section.
The `Nvidia gpu plugin`_ for Kubernetes applies `taints`_ to GPU nodes; these taints prevent non-GPU pods from being scheduled on GPU nodes.
Managed Kubernetes services like GKE, EKS, and AKS automatically apply matching `tolerations`_
to pods requesting GPU resources. Tolerations are applied by means of Kubernetes's `ExtendedResourceToleration`_`admission controller`_.
If this admission controller is not enabled for your Kubernetes cluster, you may need to manually add a GPU toleration each of to your GPU pod configurations. For example,
..code-block:: yaml
apiVersion: v1
kind: Pod
metadata:
generateName: example-cluster-ray-worker
spec:
...
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
...
containers:
- name: ray-node
image: rayproject/ray:nightly-gpu
...
Node selectors and node labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To ensure Ray pods are bound to Kubernetes nodes satisfying specific
conditions (such as the presence of GPU hardware), you may wish to use
the `nodeSelector` field of your `workerGroup`'s pod template `spec`.
See the `Kubernetes docs`_ for more about Pod-to-Node assignment.
Further reference and discussion
--------------------------------
Read about Kubernetes device plugins `here <https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/>`__,
about Kubernetes GPU plugins `here <https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus>`__,
and about Nvidia's GPU plugin for Kubernetes `here <https://github.com/NVIDIA/k8s-device-plugin>`__.