mirror of
https://github.com/vale981/ray
synced 2025-03-05 18:11:42 -05:00
![]() * Fix common.fbs rename (due to apache/arrow/commit/bef9a1c251397311a6415d3dc362ef419d154caa) * Add missing COPTS * Use socketpair(AF_INET) if boost::asio::local is unavailable (e.g. on Windows) * Fix compile bug in service_based_gcs_client_test.cc (fix build breakage in #6686) * Work around googletest/gmock inability to specify override to avoid -Werror,-Winconsistent-missing-override * Fix missing override on IsPlasmaBuffer() * Fix missing libraries for streaming * Factor out install-toolchains.sh * Put some Bazel flags into .bazelrc * Fix jni_md.h missing inclusion * Add ~/bin to PATH for Bazel * Change echo $$(date) > $@ to date > $@ * Fix lots of unquoted paths * Add system() call checks for Windows Co-authored-by: GitHub Web Flow <noreply@github.com> |
||
---|---|---|
.. | ||
api/v1alpha1 | ||
bin | ||
config | ||
controllers | ||
hack | ||
.gitignore | ||
BUILD.bazel | ||
Dockerfile | ||
go.mod | ||
go.sum | ||
main.go | ||
Makefile | ||
README.md | ||
WORKSPACE |
Ray Kubernetes Operator
The Ray Operator is a Kubernetes operator to automate provisioning, management, autoscaling and operations of Ray clusters deployed to Kubernetes.
Some of the main features of Ray-Operator are:
- user management via CRD
- heterogeneous pods in one Ray cluster with specific affinity, toleration and other pre-defined settings
- monitoring via Prometheus
- HA for Ray Kubernetes Operator, there will be a lead election if lead crashes
File structure:
ray/deploy/ray-operator ├── api/v1alpha1 // Package v1alpha1 contains API Schema definitions for the ray v1alpha1 API group │ ├── groupversion_info.go // contains common metadata about the group-version │ ├── raycluster_types.go // RayCluster field definitions, user should focus │ └── zz_generated.deepcopy.go // contains the autogenerated implementation of the aforementioned runtime.Object interface, which marks all of our root types as representing Kinds. │ │── config // contains Kustomize YAML definitions required to launch our controller on a cluster,hold our CustomResourceDefinitions, RBAC configuration, and WebhookConfigurations. │ ├── certmanager │ │ ├── certificate.yaml // The following manifests contain a self-signed issuer CR and a certificate CR. │ │ ├── kustomization.yaml │ │ └── kustomizeconfig.yaml │ │ │ ├── crd │ │ └── bases │ │ │ └── ray.io_rayclusters.yaml // RayCluster CRD yaml file │ │ └── patches │ │ │ ├── cainjection_in_rayclusters.yaml // adds a directive for certmanager to inject CA into the CRD │ │ │ └── webhook_in_rayclusters.yaml // enables conversion webhook for CRD │ │ │── kustomization.yaml │ │ └── kustomizeconfig.yaml │ │ │ ├── default // contains a Kustomize base for launching the controller in a standard configuration. │ │ ├── kustomization.yaml │ │ ├── manager_auth_proxy_patch.yaml // inject a sidecar container which is a HTTP proxy for the controller manager, it performs RBAC authorization against the Kubernetes API using SubjectAccessReviews. │ │ ├── manager_webhook_patch.yaml // webhook yaml file │ │ └── webhookcainjection_patch.yaml // add annotation to admission webhook config │ │ │ ├── manager // launch your controllers as pods in the cluster. │ │ ├── kustomization.yaml │ │ └── manager.yaml // manager yaml to create controller deployment, user should focus │ │ │ ├── prometheus │ │ ├── kustomization.yaml │ │ └── monitor.yaml // Prometheus Monitor Service, user should focus │ │ │ ├── rbac // permissions required to run your controllers under their own service account. │ │ ├── auth_proxy_role.yaml │ │ ├── auth_proxy_role_binding.yaml │ │ ├── auth_proxy_service.yaml │ │ ├── kustomization.yaml │ │ ├── leader_election_role.yaml // permissions to do leader election. │ │ ├── leader_election_role_binding.yaml │ │ └── role_binding.yaml │ │ │ ├── samples // sample RayCluster yaml, user should focus │ │ ├── ray_v1_raycluster.complete.yaml │ │ ├── ray_v1_raycluster.heterogeneous.yaml │ │ └── ray_v1_raycluster.mini.yaml │ │ │ └── webhook │ ├── kustomization.yaml │ ├── kustomizeconfig.yaml │ ├── manifests.yaml │ └── service.yaml // webhook-service │ │── controller │ ├── common │ │ ├── constant.go │ │ ├── meta.go │ │ ├── pod.go │ │ └── service.go │ └── raycluster_controller.go │ │── main.go └── Makefile
RayCluster sample CR
To introduce the Ray-Operator, give 3 samples of RayCluster CR.
Sample | desc |
---|---|
RayCluster.mini.yaml | 2 pods in this sample, 1 for head and 1 for workers.The least information to start ray cluster, run in local test. |
RayCluster.heterogeneous.yaml | 3 pods in this sample, 1 for head and 2 for workers but with different specifications. Different quota(like CPU/MEMORY) compares to mini version, run in local test. |
RayCluster.complete.yaml | a complete version CR for customized requirement, show how to set Customized props. More props to set compares to heterogeneous version, run in production. |
RayCluster CRD
Refers to file raycluster_types.go for code details.
If interested in CRD, refer to file CRD for more details.
Software requirement
Take care some software have dependency.
software | version | memo |
---|---|---|
kustomize | v3.1.0+ | download |
kubectl | v1.11.3+ | download |
Kubernetes Cluster | Access to a Kubernetes v1.11.3+ cluster | Minikube for local test |
go | v1.13+ | download |
docker | 17.03+ | download |
Also you will need kubeconfig in ~/.kube/config, so you can access to Kubernetes Cluster.
Get started
Below gives a guide for user to submit RayCluster step by step:
Install CRDs into a cluster
kustomize build config/crd | kubectl apply -f -
Build manager docker image
View Makefile for more command and info.
make docker-build
Push manager docker image to some docker repo
View Makefile for more command and info.
make docker-push
Deploy the controller in the configured Kubernetes cluster in ~/.kube/config
- For this version controller will run in ray-operator-system namespace, which maybe can't be tolerated in production.
- We will add more detailed RBAC file to control the namespace used in production, and the controller will run in that namespace to control the permission.
- Also, we will provide the more detailed guide for user to run in a controlled way.
kustomize build config/default | kubectl apply -f -
Submit RayCluster to Kubernetes
kubectl create -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system
Apply RayCluster to Kubernetes
kubectl apply -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system
Delete RayCluster to Kubernetes
kubectl delete -f config/samples/ray_v1_raycluster.mini.yaml -n ray-operator-system
Build with bazel
bazel run //:gazelle
bazel build //:ray-operator