The Ray Operator is a Kubernetes operator to automate provisioning, management, autoscaling and operations of Ray clusters deployed to Kubernetes.
Some of the main features of Ray-Operator are:
- user management via CRD
- heterogeneous pods in one Ray cluster with specific affinity, toleration and other pre-defined settings
- monitoring via Prometheus
- HA for Ray Kubernetes Operator, there will be a lead election if lead crashes
## File structure:
> ```
> ray/deploy/ray-operator
> ├── api/v1alpha1 // Package v1alpha1 contains API Schema definitions for the ray v1alpha1 API group
> │ ├── groupversion_info.go // contains common metadata about the group-version
> │ ├── raycluster_types.go // RayCluster field definitions, user should focus
> │ └── zz_generated.deepcopy.go // contains the autogenerated implementation of the aforementioned runtime.Object interface, which marks all of our root types as representing Kinds.
> │ ├── default // contains a Kustomize base for launching the controller in a standard configuration.
> │ │ ├── kustomization.yaml
> │ │ ├── manager_auth_proxy_patch.yaml // inject a sidecar container which is a HTTP proxy for the controller manager, it performs RBAC authorization against the Kubernetes API using SubjectAccessReviews.
To introduce the Ray-Operator, give 3 samples of RayCluster CR.
Sample | desc
------------- | -------------
[RayCluster.mini.yaml](config/samples/ray_v1_raycluster.mini.yaml) | 2 pods in this sample, 1 for head and 1 for workers.The least information to start ray cluster, run in local test.
[RayCluster.heterogeneous.yaml](config/samples/ray_v1_raycluster.heterogeneous.yaml) | 3 pods in this sample, 1 for head and 2 for workers but with different specifications. Different quota(like CPU/MEMORY) compares to mini version, run in local test.
[RayCluster.complete.yaml](config/samples/ray_v1_raycluster.complete.yaml) | a complete version CR for customized requirement, show how to set Customized props. More props to set compares to heterogeneous version, run in production.
## RayCluster CRD
Refers to file [raycluster_types.go](api/v1alpha1/raycluster_types.go) for code details.
If interested in CRD, refer to file [CRD](config/crd/bases/ray.io_rayclusters.yaml) for more details.
* We will add more detailed RBAC file to control the namespace used in production, and the controller will run in that namespace to control the permission.