
- Adds KubeRay information to the production guide. - Consolidates the two user guides we had related to production deployment. - Adds information about experimental GCS HA feature.
13 KiB
(serve-in-production-kubernetes)=
Deploying on Kubernetes
This section should help you:
- understand how to install and use the KubeRay operator.
- understand how to deploy a Ray Serve application using a RayService.
- understand how to monitor and update your application.
The recommended way to deploy Ray Serve is on Kubernetes, providing the best of both worlds: the user experience and scalable compute of Ray Serve and operational benefits of Kubernetes. This also allows you to integrate with existing applications that may be running on Kubernetes. The recommended practice when running on Kubernetes is to use the RayService controller that's provided as part of KubeRay. The RayService controller automatically handles important production requirements such as health checking, status reporting, failure recovery, and upgrades.
A RayService CR encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest.
Deploying, upgrading, and getting the status of the application can be done using standard kubectl
commands.
This section walks through how to deploy, monitor, and upgrade the FruitStand
example on Kubernetes.
:::{warning} Although it's actively developed and maintained, KubeRay is still considered alpha, or experimental, so some APIs may be subject to change. :::
Installing the KubeRay operator
This guide assumes that you have a running Kubernetes cluster and have kubectl
configured to run commands on it.
See the Kubernetes documentation or the KubeRay quickstart guide if you need help getting started.
The first step is to install the KubeRay
operator into your Kubernetes cluster.
This creates a pod that runs the KubeRay
controller. The KubeRay
controller manages resources based on the RayService
CRs you create.
Install the operator using kubectl apply
and check that the controller pod is running:
$ kubectl create -k "github.com/ray-project/kuberay/ray-operator/config/default?ref=v0.3.0&timeout=90s"
$ kubectl get deployments -n ray-system
NAME READY UP-TO-DATE AVAILABLE AGE
kuberay-apiserver 1/1 1 1 13s
kuberay-operator 1/1 1 1 13s
$ kubectl get pods -n ray-system
NAME READY STATUS RESTARTS AGE
kuberay-apiserver-799bc6dd95-787w7 1/1 Running 0 42s
kuberay-operator-68c75b5d5f-m8xd7 1/1 Running 0 42s
For more details, see the KubeRay quickstart guide.
Deploying a Serve application
Once the KubeRay controller is running, you can manage your Ray Serve application by creating and updating a RayService
custom resource (CR).
RayService
custom resources consist of the following:
- a
KubeRay
RayCluster
config defining the cluster that the Serve application runs on. - a Ray Serve config defining the Serve application to run on the cluster.
When the RayService
is created, the KubeRay
controller first creates a Ray cluster using the provided configuration.
Then, once the cluster is running, it deploys the Serve application to the cluster using the REST API.
The controller also creates a Kubernetes Service that can be used to route traffic to the Serve application.
Let's see this in action by deploying the FruitStand
example.
The Serve config for the example is embedded into this example RayService
CR.
To follow along, save this CR locally in a file named ray_v1alpha1_rayservice.yaml
:
:::{note}
The example RayService
uses very small resource requests because it's only for demonstration.
In production, you'll want to provide more resources to the cluster.
Learn more about how to configure KubeRay clusters here.
:::
$ curl -o ray_v1alpha1_rayservice.yaml https://raw.githubusercontent.com/ray-project/kuberay/release-0.3/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml
To deploy the example, we simply kubectl apply
the CR.
This creates the underlying Ray cluster, consisting of a head and worker node pod (see Ray Clusters Key Concepts for more details on Ray clusters), as well as the service that can be used to query our application:
$ kubectl apply -f ray_v1alpha1_rayservice.yaml
$ kubectl get rayservices
NAME AGE
rayservice-sample 7s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
rayservice-sample-raycluster-qd2vl-worker-small-group-bxpp6 1/1 Running 0 24m
rayservice-sample-raycluster-qd2vl-head-45hj4 1/1 Running 0 24m
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 62d
# Services used internally by the KubeRay controller.
rayservice-sample-head-svc ClusterIP 10.100.34.24 <none> 6379/TCP,8265/TCP,10001/TCP,8000/TCP,52365/TCP 24m
rayservice-sample-raycluster-qd2vl-dashboard-svc ClusterIP 10.100.109.177 <none> 52365/TCP 24m
rayservice-sample-raycluster-qd2vl-head-svc ClusterIP 10.100.180.221 <none> 6379/TCP,8265/TCP,10001/TCP,8000/TCP,52365/TCP 24m
# The Serve service that we will use to send queries to the application.
rayservice-sample-serve-svc ClusterIP 10.100.39.92 <none> 8000/TCP 24m
Note that the rayservice-sample-serve-svc
above is the one that can be used to send queries to the Serve application -- this will be used in the next section.
Querying the application
Once the RayService
is running, we can query it over HTTP using the service created by the KubeRay controller.
This service can be queried directly from inside the cluster, but to access it from your laptop you'll need to configure a Kubernetes ingress or use port forwarding as below:
$ kubectl port-forward service/rayservice-sample-serve-svc 8000
$ curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
6
Getting the status of the application
As the RayService
is running, the KubeRay
controller continually monitors it and writes relevant status updates to the CR.
You can view the status of the application using kubectl describe
.
This includes the status of the cluster, events such as health check failures or restarts, and the application-level statuses reported by serve status
.
$ kubectl get rayservices
NAME AGE
rayservice-sample 7s
$ kubectl describe rayservice rayservice-sample
...
Status:
Active Service Status:
App Status:
Last Update Time: 2022-08-16T20:52:41Z
Status: RUNNING
Dashboard Status:
Health Last Update Time: 2022-08-16T20:52:41Z
Is Healthy: true
Last Update Time: 2022-08-16T20:52:41Z
Ray Cluster Name: rayservice-sample-raycluster-9ghjw
Ray Cluster Status:
Available Worker Replicas: 2
Desired Worker Replicas: 1
Endpoints:
Client: 10001
Dashboard: 8265
Dashboard - Agent: 52365
Gcs - Server: 6379
Serve: 8000
Last Update Time: 2022-08-16T20:51:14Z
Max Worker Replicas: 5
Min Worker Replicas: 1
State: ready
Serve Deployment Statuses:
Health Last Update Time: 2022-08-16T20:52:41Z
Last Update Time: 2022-08-16T20:52:41Z
Name: MangoStand
Status: HEALTHY
Health Last Update Time: 2022-08-16T20:52:41Z
Last Update Time: 2022-08-16T20:52:41Z
Name: OrangeStand
Status: HEALTHY
Health Last Update Time: 2022-08-16T20:52:41Z
Last Update Time: 2022-08-16T20:52:41Z
Name: PearStand
Status: HEALTHY
Health Last Update Time: 2022-08-16T20:52:41Z
Last Update Time: 2022-08-16T20:52:41Z
Name: FruitMarket
Status: HEALTHY
Health Last Update Time: 2022-08-16T20:52:41Z
Last Update Time: 2022-08-16T20:52:41Z
Name: DAGDriver
Status: HEALTHY
Pending Service Status:
App Status:
Dashboard Status:
Ray Cluster Status:
Service Status: Running
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForDashboard 5m44s (x2 over 5m44s) rayservice-controller Service "rayservice-sample-raycluster-9ghjw-dashboard-svc" not found
Normal WaitForServeDeploymentReady 4m37s (x17 over 5m42s) rayservice-controller Put "http://rayservice-sample-raycluster-9ghjw-dashboard-svc.default.svc.cluster.local:52365/api/serve/deployments/": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Normal WaitForServeDeploymentReady 4m35s (x6 over 5m38s) rayservice-controller Put "http://rayservice-sample-raycluster-9ghjw-dashboard-svc.default.svc.cluster.local:52365/api/serve/deployments/": dial tcp 10.121.3.243:52365: i/o timeout (Client.Timeout exceeded while awaiting headers)
Normal Running 44s (x129 over 94s) rayservice-controller The Serve applicaton is now running and healthy.
Updating the application
To update the RayService
, modify the manifest and apply it use kubectl apply
.
There are two types of updates that can occur:
- Application-level updates: when only the Serve config options are changed, the update is applied in-place on the same Ray cluster. This enables lightweight updates such as scaling a deployment up or down or modifying autoscaling parameters.
- Cluster-level updates: when the
RayCluster
config options are changed, such as updating the container image for the cluster, it may result in a cluster-level update. In this case, a new cluster is started, and the application is deployed to it. Once the new cluster is ready, the Kubernetes service is updated to point to the new cluster and the previous cluster is terminated. There should not be any downtime for the application, but note that this requires the Kubernetes cluster to be large enough to schedule both Ray clusters.
Example: Serve config update
In the FruitStand
example above, let's change the price of a mango in the Serve config to 4:
- name: MangoStand
numReplicas: 1
userConfig: |
price: 4
Now to update the application we apply the modified manifest:
$ kubectl apply -f ray_v1alpha1_rayservice.yaml
$ kubectl describe rayservice rayservice-sample
...
serveDeploymentStatuses:
- healthLastUpdateTime: "2022-07-18T21:51:37Z"
lastUpdateTime: "2022-07-18T21:51:41Z"
name: MangoStand
status: UPDATING
...
If we query the application, we can see that we now get a different result reflecting the updated price:
$ curl -X POST -H 'Content-Type: application/json' localhost:8000 -d '["MANGO", 2]'
8
Updating the RayCluster config
The process of updating the RayCluster config is the same as updating the Serve config. For example, we can update the number of worker nodes to 2 in the manifest:
workerGroupSpecs:
# the number of pods in the worker group.
- replicas: 2
$ kubectl apply -f ray_v1alpha1_rayservice.yaml
$ kubectl describe rayservice rayservice-sample
...
pendingServiceStatus:
appStatus: {}
dashboardStatus:
healthLastUpdateTime: "2022-07-18T21:54:53Z"
lastUpdateTime: "2022-07-18T21:54:54Z"
rayClusterName: rayservice-sample-raycluster-bshfr
rayClusterStatus: {}
...
In the status, you can see that the RayService
is preparing a pending cluster.
After the pending cluster is healthy, it becomes the active cluster and the previous cluster is terminated.