"In this guide, we show you how to manage and interact with Ray clusters on Kubernetes.\n",
"\n",
"You can download this guide as an executable Jupyter notebook by clicking the download button on the top right of the page.\n",
"\n",
"\n",
"## Preparation\n",
"\n",
"### Install the latest Ray release\n",
"This step is needed to interact with remote Ray clusters using {ref}`Ray Job Submission <kuberay-job>` and {ref}`Ray Client <kuberay-client>`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6dcd7d93",
"metadata": {},
"outputs": [],
"source": [
"! pip install -U \"ray[default]\""
]
},
{
"cell_type": "markdown",
"id": "656a0707",
"metadata": {},
"source": [
"See {ref}`installation` for more details. "
]
},
{
"cell_type": "markdown",
"id": "c0933e2f",
"metadata": {},
"source": [
"### Install kubectl\n",
"\n",
"We will use kubectl to interact with Kubernetes. Find installation instructions at the [Kubernetes documentation](https://kubernetes.io/docs/tasks/tools/#kubectl).\n",
"\n",
"### Access a Kubernetes cluster\n",
"\n",
"We will need access to a Kubernetes cluster. There are two options:\n",
"1. Configure access to a remote Kubernetes cluster\n",
"**OR**\n",
"\n",
"2. Run the examples locally by [installing kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation). Start your [kind](https://kind.sigs.k8s.io/) cluster by running the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c764b3ad",
"metadata": {},
"outputs": [],
"source": [
"! kind create cluster"
]
},
{
"cell_type": "markdown",
"id": "278726e0",
"metadata": {},
"source": [
"To run the example in this guide, make sure your Kubernetes cluster (or local Kind cluster) can accomodate\n",
"additional resource requests of 3 CPU and 2Gi memory. \n",
"\n",
"## Deploying the KubeRay operator\n",
"\n",
"Deploy the KubeRay Operator by cloning the KubeRay repo and applying the relevant configuration files from the master branch. "
"# Note that we must use \"kubectl create\" in the above command. \"kubectl apply\" will not work due to https://github.com/ray-project/kuberay/issues/271"
]
},
{
"cell_type": "markdown",
"id": "522f9a97",
"metadata": {},
"source": [
"Confirm that the operator is running in the namespace `ray-system`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dfec8bba",
"metadata": {},
"outputs": [],
"source": [
"! kubectl -n ray-system get pod --selector=app.kubernetes.io/component=kuberay-operator\n",
"Note that the above command deploys the operator at _Kubernetes cluster scope_; the operator will manage resources in all Kubernetes namespaces.\n",
"**If your use-case requires running the operator at single namespace scope**, refer to [the instructions at the KubeRay docs](https://github.com/ray-project/kuberay#single-namespace-version)."
]
},
{
"cell_type": "markdown",
"id": "e1fdf3f5",
"metadata": {},
"source": [
"## Deploying a Ray Cluster"
]
},
{
"cell_type": "markdown",
"id": "dac860db",
"metadata": {},
"source": [
"Once the KubeRay operator is running, we are ready to deploy a Ray cluster. To do so, we create a RayCluster Custom Resource (CR).\n",
"\n",
"In the rest of this guide, we will deploy resources into the default namespace. To use a non-default namespace, specify the namespace in your kubectl commands:\n",
"# This Ray cluster is named `raycluster-autoscaler` because it has optional Ray Autoscaler support enabled."
]
},
{
"cell_type": "markdown",
"id": "1b6abf52",
"metadata": {},
"source": [
"Once the RayCluster CR has been created, you can view it by running"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb2363bc",
"metadata": {},
"outputs": [],
"source": [
"! kubectl get raycluster\n",
"\n",
"# NAME AGE\n",
"# raycluster-autoscaler XXs"
]
},
{
"cell_type": "markdown",
"id": "d4bd4e47",
"metadata": {},
"source": [
"The KubeRay operator will detect the RayCluster object. The operator will then start your Ray cluster by creating head and worker pods. To view Ray cluster's pods, run the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "48d938b2",
"metadata": {},
"outputs": [],
"source": [
"# View the pods in the Ray cluster named \"raycluster-autoscaler\"\n",
"! kubectl get pods --selector=ray.io/cluster=raycluster-autoscaler\n",
"We see a Ray head pod with two containers -- the Ray container and autoscaler sidecar. We also have a Ray worker with its single Ray container.\n",
"\n",
"Wait for the pods to reach Running state. This may take a few minutes -- most of this time is spent downloading the Ray images. In a separate shell, you may wish to observe the pods' status in real-time with the following command:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "68dab885",
"metadata": {},
"outputs": [],
"source": [
"# If you're on MacOS, first `brew install watch`.\n",
"# Run in a separate shell:\n",
"! watch -n 1 kubectl get pod"
]
},
{
"cell_type": "markdown",
"id": "b63e1ab9",
"metadata": {},
"source": [
"## Interacting with a Ray Cluster\n",
"\n",
"Now, let's interact with the Ray cluster we've deployed.\n",
"\n",
"### Accessing the cluster with kubectl exec\n",
"\n",
"The most straightforward way to experiment with your Ray cluster is to\n",
"exec directly into the head pod. First, identify your Ray cluster's head pod:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2538c4fd",
"metadata": {},
"outputs": [],
"source": [
"! kubectl get pods --selector=ray.io/cluster=raycluster-autoscaler --selector=ray.io/node-type=head -o custom-columns=POD:metadata.name --no-headers\n",
" \n",
"# raycluster-autoscaler-head-xxxxx"
]
},
{
"cell_type": "markdown",
"id": "190b2163",
"metadata": {},
"source": [
"Now, we can run a Ray program on the head pod. The Ray program in the next cell asks the autoscaler to scale the cluster to a total of 3 CPUs. The head and worker in our example cluster each have a capacity of 1 CPU, so the request should trigger upscaling of an additional worker pod.\n",
"\n",
"Note that in real-life scenarios, you will want to use larger Ray pods. In fact, it is advantageous to size each Ray pod to take up an entire Kubernetes node. See the {ref}`configuration guide<kuberay-config>` for more details."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c35b2454",
"metadata": {},
"outputs": [],
"source": [
"# Substitute your output from the last cell in place of \"raycluster-autoscaler-head-xxxxx\"\n",
"The KubeRay operator configures a [Kubernetes service](https://kubernetes.io/docs/concepts/services-networking/service/) targeting the Ray head pod. This service allows us to interact with Ray clusters without directly executing commands in the Ray container. To identify the Ray head service for our example cluster, run"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d3dae5fd",
"metadata": {},
"outputs": [],
"source": [
"! kubectl get service raycluster-autoscaler-head-svc\n",
"\n",
"# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\n",
"Ray provides a [Job Submission API](https://docs.ray.io/en/master/cluster/job-submission.html#ray-job-submission) which can be used to submit Ray workloads to a remote Ray cluster. The Ray Job Submission server listens on the Ray head's Dashboard port, 8265 by default. Let's access the dashboard port via port-forwarding. \n",
"\n",
"Note: The following port-forwarding command is blocking. If you are following along from a Jupyter notebook, the command must be executed in a separate shell outside of the notebook."
"Note: We use port-forwarding in this guide as a simple way to experiment with a Ray cluster's services. For production use-cases, you would typically either \n",
"- Access the service from within the Kubernetes cluster or\n",
"- Use an ingress controller to expose the service outside the cluster.\n",
"\n",
"See the {ref}`networking notes <kuberay-networking>` for details.\n",
"\n",
"Now that we have access to the Dashboard port, we can submit jobs to the Ray Cluster:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28a6bca6",
"metadata": {},
"outputs": [],
"source": [
"# The following job's logs will show the Ray cluster's total resource capacity, including 3 CPUs.\n",
"Assuming the port-forwarding process described above is still running, you may view the {ref}`ray-dashboard` by visiting `localhost:8265` in you browser.\n",
"\n",
"The dashboard port will not be used in the rest of this guide. You may stop the port-forwarding process if you wish.\n",
"\n",
"(kuberay-client)=\n",
"### Accessing the cluster using Ray Client\n",
"\n",
"[Ray Client](https://docs.ray.io/en/latest/cluster/ray-client.html) allows you to interact programatically with a remote Ray cluster using the core Ray APIs.\n",
"To try out Ray Client, first make sure your local Ray version and Python minor version match the versions used in your Ray cluster. The Ray cluster in our example is running Ray 2.0.0 and Python 3.7, so that's what we'll need locally. If you have a different local Python version and would like to avoid changing it, you can modify the images specified in the yaml file `ray-cluster.autoscaler.yaml`. For example, use `rayproject/ray:2.0.0-py38` for Python 3.8.\n",
"\n",
"After confirming the Ray and Python versions match up, the next step is to port-forward the Ray Client server port (10001 by default).\n",
"If you are following along in a Jupyter notebook, execute the following command in a separate shell."