customresourcedefinition.apiextensions.k8s.io/rayclusters.cluster.ray.io created
Picking a Kubernetes Namespace
-------------------------------
The rest of the Kubernetes resources we will use are `namespaced`_.
You can use an existing namespace for your Ray clusters or create a new one if you have permissions.
For this example, we will create a namespace called ``ray``.
..code-block:: shell
$ kubectl create namespace ray
namespace/ray created
Starting the Operator
----------------------
To launch the operator in our namespace, we execute the following command.
..code-block:: shell
$ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/operator.yaml
serviceaccount/ray-operator-serviceaccount created
role.rbac.authorization.k8s.io/ray-operator-role created
rolebinding.rbac.authorization.k8s.io/ray-operator-rolebinding created
pod/ray-operator-pod created
The output shows that we've launched a Pod named ``ray-operator-pod``. This is the pod that runs the operator process.
The ServiceAccount, Role, and RoleBinding we have created grant the operator pod the `permissions`_ it needs to manage Ray clusters.
Launching Ray Clusters
----------------------
Finally, to launch a Ray cluster, we create a RayCluster custom resource.
..code-block:: shell
$ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml
raycluster.cluster.ray.io/example-cluster created
The operator detects the RayCluster resource we've created and launches an autoscaling Ray cluster.
Our RayCluster configuration specifies ``minWorkers:2`` in the second entry of ``spec.podTypes``, so we get a head node and two workers upon launch.
..note::
For more details about RayCluster resources, we recommend take a looking at the annotated example ``example_cluster.yaml`` applied in the last command.
example-cluster2:2020-12-12 13:55:36,870 INFO resource_demand_scheduler.py:159 -- Placement group demands: []
example-cluster2:2020-12-12 13:55:36,870 INFO resource_demand_scheduler.py:186 -- Resource demands: []
example-cluster2:2020-12-12 13:55:36,870 INFO resource_demand_scheduler.py:187 -- Unfulfilled demands: []
example-cluster2:2020-12-12 13:55:36,891 INFO resource_demand_scheduler.py:209 -- Node requests: {}
example-cluster2:2020-12-12 13:55:36,903 DEBUG autoscaler.py:654 -- example-cluster2-ray-worker-tdxdr is not being updated and passes config check (can_update=True).
example-cluster2:2020-12-12 13:55:36,923 DEBUG autoscaler.py:654 -- example-cluster2-ray-worker-tdxdr is not being updated and passes config check (can_update=True).
Updating and Retrying
---------------------
To update a Ray cluster's configuration, edit the ``yaml`` file of the corresponding RayCluster resource
and apply it again:
..code-block:: shell
$ kubectl -n ray apply -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster.yaml
To force a restart with the same configuration, you can add an `annotation`_ to the RayCluster resource's ``metadata.labels`` field, e.g.
..code-block:: yaml
apiVersion: cluster.ray.io/v1
kind: RayCluster
metadata:
name: example-cluster
annotations:
try: again
spec:
...
Then reapply the RayCluster, as above.
Currently, editing and reapplying a RayCluster resource will stop and restart Ray processes running on the corresponding
Ray cluster. Similarly, deleting and relaunching the operator pod will stop and restart Ray processes on all Ray clusters in the operator's namespace.
This behavior may be modified in future releases.
Cleaning Up
-----------
We shut down a Ray cluster by deleting the associated RayCluster resource.
Either of the next two commands will delete our second cluster ``example-cluster2``.
..code-block:: shell
$ kubectl -n ray delete raycluster example-cluster2
# OR
$ kubectl -n ray delete -f ray/python/ray/autoscaler/kubernetes/operator_configs/example_cluster2.yaml
The pods associated with ``example-cluster2`` go into ``TERMINATING`` status. In a few moments, we check that these pods are gone: