ray/doc/source/cluster/kuberay.md

# Deploying with KubeRay (experimental)

```{admonition} What is Kuberay?
[KubeRay](https://github.com/ray-project/kuberay) is a set of tools for running Ray on Kubernetes.
It has been used by some larger corporations to deploy Ray on their infrastructure.
Going forward, we would like to make this way of deployment accessible and seamless for
all Ray users and standardize Ray deployment on Kubernetes around KubeRay's operator.
Presently you should consider this integration a minimal viable product that is not polished
enough for general use and prefer the [Kubernetes integration](kubernetes.rst) for running
Ray on Kubernetes. If you are brave enough to try the KubeRay integration out, this documentation
is for you! We would love your feedback as a [Github issue](https://github.com/ray-project/ray/issues)
including `[KubeRay]` in the title.
```

Here we describe how you can deploy a Ray cluster on KubeRay. The following instructions are for
Minikube but the deployment works the same way on a real Kubernetes cluster. You need to have at
least 4 CPUs to run this example. First we make sure Minikube is initialized with

```shell
minikube start
```

Now you can deploy the KubeRay operator using

```shell
./ray/python/ray/autoscaler/kuberay/init-config.sh
kubectl create -k "ray/python/ray/autoscaler/kuberay/config/default"
```

You can verify that the operator has been deployed using

```shell
kubectl -n ray-system get pods
```

Now let's deploy a new Ray cluster:

```shell
kubectl create -f ray/python/ray/autoscaler/kuberay/ray-cluster.complete.yaml
```

## Using the autoscaler

Let's now try out the autoscaler. We can run the following command to get a
Python interpreter in the head pod:

```shell
kubectl exec `kubectl get pods -o custom-columns=POD:metadata.name | grep raycluster-complete-head` -it -c ray-head -- python
```

In the Python interpreter, run the following snippet to scale up the cluster:

```python
import ray.autoscaler.sdk
ray.init("auto")
ray.autoscaler.sdk.request_resources(num_cpus=4)
```

> **_NOTE:_**  The example config ray-cluster.complete.yaml specifies rayproject/ray:8c5fe4
> as the Ray autoscaler image. This image carries the latest improvements to KubeRay autoscaling
> support. This autoscaler image is confirmed to be compatible with Ray versions >= 1.11.0.
> Once Ray autoscaler support is stable, the recommended pattern will be to use the same
> Ray version in the autoscaler and Ray containers.

## Uninstalling the KubeRay operator

You can uninstall the KubeRay operator using
```shell
kubectl delete -f "ray/python/ray/autoscaler/kuberay/kuberay-autoscaler-rbac.yaml"
kubectl delete -k "ray/python/ray/autoscaler/kuberay/config/default"
```

Note that all running Ray clusters will automatically be terminated.

## Developing the KubeRay integration (advanced)

### Developing the KubeRay operator
If you also want to change the underlying KubeRay operator, please refer to the instructions
in [the KubeRay development documentation](https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md). In that case you should push the modified operator to your docker account or registry and
follow the instructions in `ray/python/ray/autoscaler/kuberay/init-config.sh`.

### Developing the Ray autoscaler code
Code for the Ray autoscaler's KubeRay integration is located in `ray/python/ray/autoscaler/_private/kuberay`.

Here is one procedure to test development autoscaler code.
1. Push autoscaler code changes to your fork of Ray.
2. Use the following Dockerfile to build an image with your changes.
```dockerfile
# Use the latest Ray master as base.
FROM rayproject/ray:nightly
# Invalidate the cache so that fresh code is pulled in the next step.
ARG BUILD_DATE
# Retrieve your development code.
RUN git clone -b <my-dev-branch> https://github.com/<my-git-handle>/ray
# Install symlinks to your modified Python code.
RUN python ray/python/ray/setup-dev.py -y
```
3. Push the image to your docker account or registry. Assuming your Dockerfile is named "Dockerfile":
```shell
docker build --build-arg BUILD_DATE=$(date +%Y-%m-%d:%H:%M:%S) -t <registry>/<repo>:<tag> - < Dockerfile
docker push <registry>/<repo>:<tag>
```
4. Update the autoscaler image in `ray-cluster.complete.yaml`

Refer to the [Ray development documentation](https://docs.ray.io/en/latest/development.html#building-ray-python-only) for
further details.
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`# Deploying with KubeRay (experimental)`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00
			```{admonition} What is Kuberay?
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`[KubeRay](https://github.com/ray-project/kuberay) is a set of tools for running Ray on Kubernetes.`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			`It has been used by some larger corporations to deploy Ray on their infrastructure.`
			`Going forward, we would like to make this way of deployment accessible and seamless for`
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`all Ray users and standardize Ray deployment on Kubernetes around KubeRay's operator.`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			`Presently you should consider this integration a minimal viable product that is not polished`
			`enough for general use and prefer the [Kubernetes integration](kubernetes.rst) for running`
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`Ray on Kubernetes. If you are brave enough to try the KubeRay integration out, this documentation`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			`is for you! We would love your feedback as a [Github issue](https://github.com/ray-project/ray/issues)`
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			including `[KubeRay]` in the title.
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			```

[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`Here we describe how you can deploy a Ray cluster on KubeRay. The following instructions are for`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			`Minikube but the deployment works the same way on a real Kubernetes cluster. You need to have at`
			`least 4 CPUs to run this example. First we make sure Minikube is initialized with`

			```shell
			`minikube start`
			```

[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`Now you can deploy the KubeRay operator using`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00
			```shell
			`./ray/python/ray/autoscaler/kuberay/init-config.sh`
[kuberay][autoscaler] Use new autoscaling fields from the KubeRay operator (#25386) This PR incorporates recent autoscaler config changes from KubeRay. 2022-06-08 20:09:43 -07:00			`kubectl create -k "ray/python/ray/autoscaler/kuberay/config/default"`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			```

			`You can verify that the operator has been deployed using`

			```shell
			`kubectl -n ray-system get pods`
			```

			`Now let's deploy a new Ray cluster:`

			```shell
			`kubectl create -f ray/python/ray/autoscaler/kuberay/ray-cluster.complete.yaml`
			```

			`## Using the autoscaler`

			`Let's now try out the autoscaler. We can run the following command to get a`
			`Python interpreter in the head pod:`

			```shell
			kubectl exec `kubectl get pods -o custom-columns=POD:metadata.name \| grep raycluster-complete-head` -it -c ray-head -- python
			```

			`In the Python interpreter, run the following snippet to scale up the cluster:`

			```python
			`import ray.autoscaler.sdk`
			`ray.init("auto")`
			`ray.autoscaler.sdk.request_resources(num_cpus=4)`
			```

[kuberay] Test Ray client and update autoscaler image (#24195) This PR adds KubeRay e2e testing for Ray client and updates the suggested autoscaler image to one running the merge commit of PR #23883 . 2022-04-27 18:02:12 -07:00			`> _NOTE:_ The example config ray-cluster.complete.yaml specifies rayproject/ray:8c5fe4`
			`> as the Ray autoscaler image. This image carries the latest improvements to KubeRay autoscaling`
			`> support. This autoscaler image is confirmed to be compatible with Ray versions >= 1.11.0.`
			`> Once Ray autoscaler support is stable, the recommended pattern will be to use the same`
			`> Ray version in the autoscaler and Ray containers.`

[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`## Uninstalling the KubeRay operator`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`You can uninstall the KubeRay operator using`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			```shell
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			`kubectl delete -f "ray/python/ray/autoscaler/kuberay/kuberay-autoscaler-rbac.yaml"`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			`kubectl delete -k "ray/python/ray/autoscaler/kuberay/config/default"`
			```

			`Note that all running Ray clusters will automatically be terminated.`

[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`## Developing the KubeRay integration (advanced)`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			`### Developing the KubeRay operator`
[KubeRay] Format autoscaling config based on RayCluster CR (#22348) Closes #21655. At the start of each autoscaler iteration, we read the Ray Cluster CR from K8s and use it to extract the autoscaling config. 2022-02-22 11:06:37 -08:00			`If you also want to change the underlying KubeRay operator, please refer to the instructions`
			`in [the KubeRay development documentation](https://github.com/ray-project/kuberay/blob/master/ray-operator/DEVELOPMENT.md). In that case you should push the modified operator to your docker account or registry and`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			follow the instructions in `ray/python/ray/autoscaler/kuberay/init-config.sh`.

Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			`### Developing the Ray autoscaler code`
			Code for the Ray autoscaler's KubeRay integration is located in `ray/python/ray/autoscaler/_private/kuberay`.

			`Here is one procedure to test development autoscaler code.`
			`1. Push autoscaler code changes to your fork of Ray.`
			`2. Use the following Dockerfile to build an image with your changes.`
			```dockerfile
			`# Use the latest Ray master as base.`
			`FROM rayproject/ray:nightly`
[kuberay] Test Ray client and update autoscaler image (#24195) This PR adds KubeRay e2e testing for Ray client and updates the suggested autoscaler image to one running the merge commit of PR #23883 . 2022-04-27 18:02:12 -07:00			`# Invalidate the cache so that fresh code is pulled in the next step.`
[KubeRay] Fix autoscaling with GPUs and custom resources, with e2e tests (#23883) - Closes #23874 by fixing a typo ("num_gpus" -> "num-gpus"). - Adds end-to-end test logic confirming the fix. - Adds end-to-end test logic confirming autoscaling with custom resources works. - Slightly refines developer instructions. - Deflakes test logic a bit by allowing for the event that the head pod changes its identity as the Ray cluster starts up. 2022-04-21 14:54:37 -07:00			`ARG BUILD_DATE`
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			`# Retrieve your development code.`
			`RUN git clone -b <my-dev-branch> https://github.com/<my-git-handle>/ray`
			`# Install symlinks to your modified Python code.`
			`RUN python ray/python/ray/setup-dev.py -y`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00			```
[KubeRay] Fix autoscaling with GPUs and custom resources, with e2e tests (#23883) - Closes #23874 by fixing a typo ("num_gpus" -> "num-gpus"). - Adds end-to-end test logic confirming the fix. - Adds end-to-end test logic confirming autoscaling with custom resources works. - Slightly refines developer instructions. - Deflakes test logic a bit by allowing for the event that the head pod changes its identity as the Ray cluster starts up. 2022-04-21 14:54:37 -07:00			`3. Push the image to your docker account or registry. Assuming your Dockerfile is named "Dockerfile":`
			```shell
			`docker build --build-arg BUILD_DATE=$(date +%Y-%m-%d:%H:%M:%S) -t <registry>/<repo>:<tag> - < Dockerfile`
			`docker push <registry>/<repo>:<tag>`
			```
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			4. Update the autoscaler image in `ray-cluster.complete.yaml`
[Kuberay] Ray Autoscaler integration with Kuberay (MVP) (#21086) This is a minimum viable product for Ray Autoscaler integration with Kuberay. It is not ready for prime time/general use, but should be enough for interested parties to get started (see the documentation in kuberay.md). 2022-01-19 19:42:17 -08:00
Move KubeRay autoscaler files into Ray autoscaler directory, add an entry-point. (#22847) This PR consists of the following clean-up items for KubeRay autoscaler integration: Remove the docker/kuberay directory Move the Python files formerly in docker/kuberay to the autoscaler directory. Use a rayproject/ray image for the autoscaler. Add an entry point for the kuberay autoscaler to scripts.py. Use the entry point in the example config. Slightly simplify the code that starts the autoscaler. Ray versions are updated to Ray 1.11.0, which will be officially released within the next couple of days. By default, Ray >= 1.11.0 runs without Redis. References to Redis are removed from the example config. Add the autoscaler configuration test to the CI. Update development documentation to reflect the changes in this PR. 2022-03-09 18:26:57 -08:00			`Refer to the [Ray development documentation](https://docs.ray.io/en/latest/development.html#building-ray-python-only) for`
			`further details.`