2022-01-20 15:30:56 -08:00
.. include :: we_are_hiring.rst
2021-11-23 14:56:22 -08:00
2020-06-26 09:29:22 -07:00
.. _ray-k8s-deploy:
2022-07-19 13:28:04 -07:00
The legacy Ray Kubernetes Operator
==================================
.. note ::
This documentation describes deploying Ray on Kubernetes using the legacy Ray Operator hosted in
the Ray repo.
Going forward, the :ref: `preferred tool for deploying Ray on Kubernetes<kuberay-index>` will be the `KubeRay operator`_ .
The legacy operator described on this page can still be used to deploy on Kubernetes. However, the legacy operator
will enter maintenance mode in a future Ray release.
To learn more about KubeRay, see the links below:
- :ref: `Ray's guides for deploying using KubeRay<kuberay-index>` .
- `The KubeRay documentation`_ .
- `The KubeRay GitHub`_ .
- :ref: `A comparison of KubeRay and the legacy Ray Operator<kuberay-vs-legacy>` .
2021-05-17 19:55:10 -04:00
Overview
--------
You can leverage your `Kubernetes`_ cluster as a substrate for execution of distributed Ray programs.
The :ref: `Ray Autoscaler<cluster-index>` spins up and deletes Kubernetes `Pods`_ according to the resource demands of the Ray workload. Each Ray node runs in its own Kubernetes Pod.
2019-01-14 05:56:47 +08:00
2021-02-11 23:00:25 -08:00
Quick Guide
-----------
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
This document cover the following topics:
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
- :ref: `Intro to the Ray Kubernetes Operator<ray-operator>`
- :ref: `Launching Ray clusters with the Ray Helm Chart<ray-helm>`
- :ref: `Monitoring Ray clusters<ray-k8s-monitor>`
- :ref: `Running Ray programs using Ray Client<ray-k8s-client>`
2021-01-11 21:36:31 -08:00
2021-02-11 23:00:25 -08:00
You can find more information at the following links:
2021-01-12 20:35:38 -08:00
2021-05-17 19:55:10 -04:00
- :ref: `Ray Operator and Helm chart configuration<k8s-advanced>`
2021-02-11 23:00:25 -08:00
- :ref: `GPU usage with Kubernetes<k8s-gpus>`
- :ref: `Using Ray Tune on your Kubernetes cluster<tune-kubernetes>`
- :ref: `How to manually set up a non-autoscaling Ray cluster on Kubernetes<ray-k8s-static>`
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
.. _ray-operator:
2019-01-14 05:56:47 +08:00
2021-02-11 23:00:25 -08:00
The Ray Kubernetes Operator
---------------------------
2021-05-17 19:55:10 -04:00
Deployments of Ray on Kubernetes are managed by the `` Ray Kubernetes Operator `` .
The Ray Operator follows the standard Kubernetes `Operator pattern`_ . The main players are
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
- A `Custom Resource`_ called a `` RayCluster `` , which describes the desired state of the Ray cluster.
- A `Custom Controller`_ , the `` Ray Operator `` , which processes `` RayCluster `` resources and manages the Ray cluster.
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
Under the hood, the Operator uses the :ref: `Ray Autoscaler<cluster-index>` to launch and scale your Ray cluster.
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
The rest of this document explains how to launch a small example Ray cluster on Kubernetes.
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
- :ref: `Ray on Kubernetes Configuration and Advanced Usage<k8s-advanced>` .
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
.. _ray-helm:
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
Installing the Ray Operator with Helm
-------------------------------------
Ray provides a `Helm`_ chart to simplify deployment of the Ray Operator and Ray clusters.
2021-04-29 11:45:52 -04:00
2021-06-04 19:01:39 -04:00
The `Ray Helm chart`_ is available as part of the Ray GitHub repository.
The chart will be published to a public Helm repository as part of a future Ray release.
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
Preparation
~~~~~~~~~~~
2021-01-12 20:35:38 -08:00
2021-05-17 19:55:10 -04:00
- Configure `kubectl`_ to access your Kubernetes cluster.
- Install `Helm 3`_ .
- Download the `Ray Helm chart`_ .
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
To run the default example in this document, make sure your Kubernetes cluster can accomodate
additional resource requests of 4 CPU and 2.5Gi memory.
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
Installation
~~~~~~~~~~~~
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
You can install a small Ray cluster with a single `` helm `` command.
The default cluster configuration consists of a Ray head pod and two worker pods,
with scaling allowed up to three workers.
2019-10-03 15:47:49 -07:00
.. code-block :: shell
2021-05-17 19:55:10 -04:00
# Navigate to the directory containing the chart
$ cd ray/deploy/charts
2019-10-03 15:47:49 -07:00
2021-05-17 19:55:10 -04:00
# Install a small Ray cluster with the default configuration
# in a new namespace called "ray". Let's name the Helm release "example-cluster."
2021-05-25 15:59:39 -04:00
$ helm -n ray install example-cluster --create-namespace ./ray
NAME: example-cluster
2021-05-17 19:55:10 -04:00
LAST DEPLOYED: Fri May 14 11:44:06 2021
NAMESPACE: ray
STATUS: deployed
REVISION: 1
TEST SUITE: None
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
View the installed resources as follows.
2019-10-03 15:47:49 -07:00
.. code-block :: shell
2021-05-17 19:55:10 -04:00
# The custom resource representing the state of the Ray cluster.
$ kubectl -n ray get rayclusters
NAME STATUS RESTARTS AGE
example-cluster Running 0 53s
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
# The Ray head node and two Ray worker nodes.
$ kubectl -n ray get pods
NAME READY STATUS RESTARTS AGE
example-cluster-ray-head-type-5926k 1/1 Running 0 57s
example-cluster-ray-worker-type-8gbwx 1/1 Running 0 40s
example-cluster-ray-worker-type-l6cvx 1/1 Running 0 40s
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
# A service exposing the Ray head node.
$ kubectl -n ray get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-cluster-ray-head ClusterIP 10.8.11.17 <none> 10001/TCP,8265/TCP,8000/TCP 115s
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
# The operator deployment.
# By default, the deployment is launched in namespace "default".
$ kubectl get deployment ray-operator
NAME READY UP-TO-DATE AVAILABLE AGE
ray-operator 1/1 1 1 3m1s
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
# The single pod of the operator deployment.
$ kubectl get pod -l cluster.ray.io/component=operator
NAME READY STATUS RESTARTS AGE
ray-operator-84f5d57b7f-xkvtm 1/1 Running 0 3m35
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
# The Custom Resource Definition defining a RayCluster.
$ kubectl get crd rayclusters.cluster.ray.io
NAME CREATED AT
rayclusters.cluster.ray.io 2021-05-14T18:44:02
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
.. _ray-k8s-monitor:
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
Observability
-------------
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
To view autoscaling logs, run a `` kubectl logs `` command on the operator pod:
2021-02-11 23:00:25 -08:00
.. code-block :: shell
2021-05-17 19:55:10 -04:00
# The last 100 lines of logs.
$ kubectl logs \
$(kubectl get pod -l cluster.ray.io/component=operator -o custom-columns=:metadata.name) \
| tail -n 100
2021-02-11 23:00:25 -08:00
2021-06-20 22:38:34 -04:00
.. _ray-k8s-dashboard:
2021-05-17 19:55:10 -04:00
The :ref: `Ray dashboard<ray-dashboard>` can be accessed on the Ray head node at port `` 8265 `` .
2019-10-03 15:47:49 -07:00
2021-02-11 23:00:25 -08:00
.. code-block :: shell
2021-05-17 19:55:10 -04:00
# Forward the relevant port from the service exposing the Ray head.
$ kubectl -n ray port-forward service/example-cluster-ray-head 8265:8265
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
# The dashboard can now be viewed in a browser at http://localhost:8265
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
.. _ray-k8s-client:
2021-04-29 11:45:52 -04:00
2022-01-13 23:49:34 -08:00
Running Ray programs with Ray Jobs Submission
---------------------------------------------
:ref: `Ray Job Submission <jobs-overview>` can be used to submit Ray programs to your Ray cluster.
2022-02-23 14:32:37 -08:00
To do this, you must be able to access the Ray Dashboard, which runs on the Ray head node on port `` 8265 `` .
One way to do this is to port forward `` 127.0.0.1:8265 `` on your local machine to `` 127.0.0.1:8265 `` on the head node using the :ref: `Kubernetes port-forwarding command<ray-k8s-dashboard>` .
2022-01-13 23:49:34 -08:00
.. code-block :: bash
2022-02-23 14:32:37 -08:00
$ kubectl -n ray port-forward service/example-cluster-ray-head 8265:8265
Then in a new shell, you can run a job using the CLI:
.. code-block :: bash
$ export RAY_ADDRESS="http://127.0.0.1:8265"
2022-01-13 23:49:34 -08:00
2022-06-06 15:21:19 -07:00
$ ray job submit --runtime-env-json='{"working_dir": "./", "pip": ["requests==2.26.0"]}' -- python script.py
2022-01-13 23:49:34 -08:00
2021-12-01 23:04:52,672 INFO cli.py:25 -- Creating JobSubmissionClient at address: http://127.0.0.1:8265
2021-12-01 23:04:52,809 INFO sdk.py:144 -- Uploading package gcs://_ray_pkg_bbcc8ca7e83b4dc0.zip.
2021-12-01 23:04:52,810 INFO packaging.py:352 -- Creating a file package for local directory './'.
2021-12-01 23:04:52,878 INFO cli.py:105 -- Job submitted successfully: raysubmit_RXhvSyEPbxhcXtm6.
2021-12-01 23:04:52,878 INFO cli.py:106 -- Query the status of the job using: `ray job status raysubmit_RXhvSyEPbxhcXtm6` .
2022-02-23 14:32:37 -08:00
For more ways to run jobs, including a Python SDK and a REST API, see :ref: `Ray Job Submission <jobs-overview>` .
2022-01-13 23:49:34 -08:00
2021-05-17 19:55:10 -04:00
Running Ray programs with Ray Client
------------------------------------
2021-04-29 11:45:52 -04:00
2022-01-13 23:49:34 -08:00
:ref: `Ray Client <ray-client>` can be used to interactively execute Ray programs on your Ray cluster. The Ray Client server runs on the Ray head node, on port `` 10001 `` .
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
.. note ::
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
Connecting with Ray client requires using matching minor versions of Python (for example 3.7)
on the server and client end, that is, on the Ray head node and in the environment where
2021-08-03 19:31:44 -07:00
`` ray.init("ray://<host>:<port>") `` is invoked. Note that the default `` rayproject/ray `` images use Python 3.7.
2021-06-01 12:12:35 -04:00
The latest offical Ray release builds are available for Python 3.6 and 3.8 at the `Ray Docker Hub <https://hub.docker.com/r/rayproject/ray> `_ .
2021-04-29 11:45:52 -04:00
2021-05-17 19:55:10 -04:00
Connecting with Ray client also requires matching Ray versions. To connect from a local machine to a cluster running the examples in this document, the :ref: `latest release version<installation>` of Ray must be installed locally.
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
Using Ray Client to connect from outside the Kubernetes cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One way to connect to the Ray cluster from outside your Kubernetes cluster
is to forward the Ray Client server port:
2019-01-14 05:56:47 +08:00
.. code-block :: shell
2021-05-17 19:55:10 -04:00
$ kubectl -n ray port-forward service/example-cluster-ray-head 10001:10001
2021-02-11 23:00:25 -08:00
2021-05-17 19:55:10 -04:00
Then open a new shell and try out a `sample Ray program`_ :
2019-01-14 05:56:47 +08:00
2021-02-11 23:00:25 -08:00
.. code-block :: shell
2022-01-20 15:30:56 -08:00
$ python ray/doc/kubernetes/example_scripts/run_local_example.py
2021-02-11 23:00:25 -08:00
2021-08-03 19:31:44 -07:00
The program in this example uses `` ray.init("ray://127.0.0.1:10001") `` to connect to the Ray cluster.
2021-05-17 19:55:10 -04:00
The program waits for three Ray nodes to connect and then tests object transfer
between the nodes.
2021-02-11 23:00:25 -08:00
Using Ray Client to connect from within the Kubernetes cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-05-17 19:55:10 -04:00
You can also connect to your Ray cluster from another pod in the same Kubernetes cluster.
2021-02-11 23:00:25 -08:00
For example, you can submit a Ray application to run on the Kubernetes cluster as a `Kubernetes
2019-10-03 15:47:49 -07:00
Job`_. The Job will run a single pod running the Ray driver program to
completion, then terminate the pod but allow you to access the logs.
2019-01-14 05:56:47 +08:00
2021-02-11 23:00:25 -08:00
The following command submits a Job which executes an `example Ray program`_ .
2019-01-14 05:56:47 +08:00
2021-02-11 23:00:25 -08:00
.. code-block :: yaml
2019-01-14 05:56:47 +08:00
2022-01-20 15:30:56 -08:00
$ kubectl -n ray create -f https://raw.githubusercontent.com/ray-project/ray/master/doc/kubernetes/job-example.yaml
2021-05-17 19:55:10 -04:00
job.batch/ray-test-job created
2019-01-14 05:56:47 +08:00
2021-05-17 19:55:10 -04:00
The program executed by the job uses the name of the Ray cluster's head Service to connect:
2021-08-03 19:31:44 -07:00
`` ray.init("ray://example-cluster-ray-head:10001") `` .
2021-05-17 19:55:10 -04:00
The program waits for three Ray nodes to connect and then tests object transfer
between the nodes.
2019-01-14 05:56:47 +08:00
2019-10-03 15:47:49 -07:00
To view the output of the Job, first find the name of the pod that ran it,
then fetch its logs:
2019-01-14 05:56:47 +08:00
.. code-block :: shell
2019-10-03 15:47:49 -07:00
$ kubectl -n ray get pods
2021-05-17 19:55:10 -04:00
NAME READY STATUS RESTARTS AGE
example-cluster-ray-head-type-5926k 1/1 Running 0 21m
example-cluster-ray-worker-type-8gbwx 1/1 Running 0 21m
example-cluster-ray-worker-type-l6cvx 1/1 Running 0 21m
ray-test-job-dl9fv 1/1 Running 0 3s
2019-01-14 05:56:47 +08:00
2019-10-03 15:47:49 -07:00
# Fetch the logs. You should see repeated output for 10 iterations and then
# 'Success!'
2021-05-17 19:55:10 -04:00
$ kubectl -n ray logs ray-test-job-dl9fv
2019-01-14 05:56:47 +08:00
2021-05-17 19:55:10 -04:00
# Cleanup
$ kubectl -n ray delete job ray-test-job
job.batch "ray-test-job" deleted
2019-01-14 05:56:47 +08:00
2021-06-20 22:38:34 -04:00
.. tip ::
Code dependencies for a given Ray task or actor must be installed on each Ray node that might run the task or actor.
Typically, this means that all Ray nodes need to have the same dependencies installed.
To achieve this, you can build a custom container image, using one of the `official Ray images <https://hub.docker.com/r/rayproject/ray> `_ as the base.
2022-08-05 13:11:28 -07:00
Alternatively, try out :ref: `Runtime Environments<runtime-environments>` .
2021-06-20 22:38:34 -04:00
2021-06-01 12:00:55 -07:00
.. _k8s-cleanup-basic:
2021-05-17 19:55:10 -04:00
Cleanup
-------
2019-01-14 05:56:47 +08:00
2021-06-01 12:00:55 -07:00
To remove a Ray Helm release and the associated API resources, use `kubectl delete`_ and `helm uninstall`_ .
Note the order of the commands below.
2019-01-14 05:56:47 +08:00
.. code-block :: shell
2021-06-01 12:00:55 -07:00
# First, delete the RayCluster custom resource.
$ kubectl -n ray delete raycluster example-cluster
raycluster.cluster.ray.io "example-cluster" deleted
2021-05-17 19:55:10 -04:00
# Delete the Ray release.
$ helm -n ray uninstall example-cluster
release "example-cluster" uninstalled
2021-01-11 21:36:31 -08:00
2021-05-17 19:55:10 -04:00
# Optionally, delete the namespace created for our Ray release.
$ kubectl delete namespace ray
namespace "ray" deleted
2021-01-11 21:36:31 -08:00
2021-05-17 19:55:10 -04:00
Note that `` helm uninstall `` `does not delete`_ the RayCluster CRD. If you wish to delete the CRD,
make sure all Ray Helm releases have been uninstalled, then run `` kubectl delete crd rayclusters.cluster.ray.io `` .
2021-01-11 21:36:31 -08:00
2021-06-01 12:00:55 -07:00
- :ref: `More details on resource cleanup<k8s-cleanup>`
2021-05-17 19:55:10 -04:00
Next steps
----------
:ref: `Ray Operator Advanced Configuration<k8s-advanced>`
2021-01-11 21:36:31 -08:00
2019-10-03 15:47:49 -07:00
Questions or Issues?
--------------------
2019-01-14 05:56:47 +08:00
[docs] new structure (#21776)
This PR consolidates both #21667 and #21759 (look there for features), but improves on them in the following way:
- [x] we reverted renaming of existing projects `tune`, `rllib`, `train`, `cluster`, `serve`, `raysgd` and `data` so that links won't break. I think my consolidation efforts with the `ray-` prefix were a little overeager in that regard. It's better like this. Only the creation of `ray-core` was a necessity, and some files moved into the `rllib` folder, so that should be relatively benign.
- [x] Additionally, we added Algolia `docsearch`, screenshot below. This is _much_ better than our current search. Caveat: there's a sphinx dependency that needs to be replaced (`sphinx-tabs`) by another, newer one (`sphinx-panels`), as the former prevents loading of the `algolia.js` library. Will follow-up in the next PR (hoping this one doesn't get re-re-re-re-reverted).
2022-01-22 00:42:05 +01:00
.. include :: /_includes/_help.rst
2019-01-14 05:56:47 +08:00
2021-05-17 19:55:10 -04:00
.. _`Kubernetes`: https://kubernetes.io/
2021-02-11 23:00:25 -08:00
.. _`Kubernetes Job`: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
.. _`Kubernetes Service`: https://kubernetes.io/docs/concepts/services-networking/service/
2021-05-17 19:55:10 -04:00
.. _`operator pattern`: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
.. _`Custom Resource`: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/
.. _`Custom Controller`: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-controllers
2021-02-11 23:00:25 -08:00
.. _`Kubernetes Custom Resource Definition`: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/
.. _`annotation`: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/#attaching-metadata-to-objects
.. _`permissions`: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
.. _`minikube`: https://minikube.sigs.k8s.io/docs/start/
2021-04-29 11:45:52 -04:00
.. _`namespace`: https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
.. _`Deployment`: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
2021-05-17 19:55:10 -04:00
.. _`Ray Helm chart`: https://github.com/ray-project/ray/tree/master/deploy/charts/ray/
.. _`kubectl`: https://kubernetes.io/docs/tasks/tools/
.. _`Helm 3`: https://helm.sh/
.. _`Helm`: https://helm.sh/
2021-06-01 12:00:55 -07:00
.. _`kubectl delete`: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#delete
2021-05-17 19:55:10 -04:00
.. _`helm uninstall`: https://helm.sh/docs/helm/helm_uninstall/
.. _`does not delete`: https://helm.sh/docs/chart_best_practices/custom_resource_definitions/
.. _`Pods`: https://kubernetes.io/docs/concepts/workloads/pods/
2022-01-20 15:30:56 -08:00
.. _`example Ray program`: https://github.com/ray-project/ray/tree/master/doc/kubernetes/example_scripts/job_example.py
.. _`sample Ray program`: https://github.com/ray-project/ray/tree/master/doc/kubernetes/example_scripts/run_local_example.py
2021-06-20 22:38:34 -04:00
.. _`official Ray images`: https://hub.docker.com/r/rayproject/ray
.. _`Ray Docker Hub`: https://hub.docker.com/r/rayproject/ray
2022-07-19 13:28:04 -07:00
.. _`KubeRay operator`: https://github.com/ray-project/kuberay
.. _`The KubeRay GitHub`: https://github.com/ray-project/kuberay
.. _`The KubeRay documentation`: https://ray-project.github.io/kuberay/