ray/doc/source/cluster/cluster_under_construction/ray-clusters-on-kubernetes/user-guides/config.md

(kuberay-config)=

# RayCluster Configuration

This guide covers the key aspects of Ray cluster configuration on Kubernetes.

## Introduction

Deployments of Ray on Kubernetes follow the [operator pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). The key players are
- A [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
    called a `RayCluster` describing the desired state of a Ray cluster.
- A [custom controller](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-controllers),
    the KubeRay operator, which manages Ray pods in order to match the `RayCluster`'s spec.

To deploy a Ray cluster, one creates a `RayCluster` custom resource (CR):
```shell
kubectl apply -f raycluster.yaml
```

This guide covers the salient features of `RayCluster` CR configuration.

For reference, here is a condensed example of a `RayCluster` CR in yaml format.
```yaml
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  name: raycluster-complete
spec:
  rayVersion: "2.0.0"
  enableInTreeAutoscaling: true
  autoscalerOptions:
     ...
  headGroupSpec:
    serviceType: ClusterIP # Options are ClusterIP, NodePort, and LoadBalancer
    enableIngress: false # Optional
    rayStartParams:
      block: true
      dashboard-host: "0.0.0.0"
      ...
    template: # Pod template
        metadata: # Pod metadata
        spec: # Pod spec
            containers:
            - name: ray-head
              image: rayproject/ray-ml:2.0.0
              resources:
                limits:
                  cpu: 14
                  memory: 54Gi
                requests:
                  cpu: 14
                  memory: 54Gi
              # Keep this preStop hook in each Ray container config.
              lifecycle:
                preStop:
                  exec:
                    command: ["/bin/sh","-c","ray stop"]
              ports: # Optional service port overrides
              - containerPort: 6379
                name: gcs
              - containerPort: 8265
                name: dashboard
              - containerPort: 10001
                name: client
              - containerPort: 8000
                name: serve
                ...
  workerGroupSpecs:
  - groupName: small-group
    replicas: 1
    minReplicas: 1
    maxReplicas: 5
    rayStartParams:
        ...
    template: # Pod template
      spec:
        # Keep this initContainer in each workerGroup template.
        initContainers:
        - name: init-myservice
          image: busybox:1.28
          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
        ...
  # Another workerGroup
  - groupName: medium-group
    ...
  # Yet another workerGroup, with access to special hardware perhaps.
  - groupName: gpu-group
    ...
```

The rest of this guide will discuss the `RayCluster` CR's config fields.

(kuberay-config-ray-version)=
## The Ray Version
The field `rayVersion` specifies the version of Ray used in the Ray cluster.
The `rayVersion` is used to fill default values for certain config fields.
The Ray container images specified in the RayCluster CR should carry
the same Ray version as the CR's `rayVersion`. If you are using a nightly or development
Ray image, it is fine to set `rayVersion` to the latest release version of Ray.

## Pod configuration: headGroupSpec and workerGroupSpecs

At a high level, a RayCluster is a collection of Kubernetes pods, similar to a Kubernetes Deployment or StatefulSet.
Just as with the Kubernetes built-ins, the key pieces of configuration are
* Pod specification
* Scale information (how many pods are desired)

The key difference between a Deployment and a `RayCluster` is that a `RayCluster` is
specialized for running Ray applications. A Ray cluster consists of

* One **head pod** which hosts global control processes for the Ray cluster.
  The head pod can also run Ray tasks and actors.
* Any number of **worker pods**, which run Ray tasks and actors.
  Workers come in **worker groups** of identically configured pods.
  For each worker group, we must specify **replicas**, the number of
  pods we want of that group.

The head pod’s configuration is
specified under `headGroupSpec`, while configuration for worker pods is
specified under `workerGroupSpecs`. There may be multiple worker groups,
each group with its own configuration. The `replicas` field
of a `workerGroupSpec` specifies the number of worker pods of that group to
keep in the cluster.

### Pod templates
The bulk of the configuration for a `headGroupSpec` or
`workerGroupSpec` goes in the `template` field. The `template` is a Kubernetes Pod
template which determines the configuration for the pods in the group.
Here are some of the subfields of the pod `template` to pay attention to:


#### resources
It’s important to specify container CPU and memory requests and limits for
each group spec. For GPU workloads, you may also wish to specify GPU
limits. For example, set `nvidia.com/gpu:2` if using an Nvidia GPU device plugin
and you wish to specify a pod with access to 2 GPUs.
See {ref}`this guide <kuberay-gpu>` for more details on GPU support.

It's ideal to size each Ray pod to take up the
entire Kubernetes node on which it is scheduled. In other words, it’s
best to run one large Ray pod per Kubernetes node.
In general, it is more efficient to use a few large Ray pods than many small ones.
The pattern of fewer large Ray pods has the following advantages:
- more efficient use of each Ray pod's shared memory object store
- reduced communication overhead between Ray pods
- reduced redundancy of per-pod Ray control structures such as Raylets

The CPU, GPU, and memory **limits** specified in the Ray container config
will be automatically advertised to Ray. These values will be used as
the logical resource capacities of Ray pods in the head or worker group.
Note that CPU quantities will be rounded up to the nearest integer
before being relayed to Ray.
The resource capacities advertised to Ray may be overridden in the {ref}`rayStartParams`.

On the other hand CPU, GPU, and memory **requests** will be ignored by Ray.
For this reason, it is best when possible to set resource requests equal to resource limits.

#### nodeSelector and tolerations
You can control the scheduling of worker groups' Ray pods by setting the `nodeSelector` and
`tolerations` fields of the pod spec. Specifically, these fields determine on which Kubernetes
nodes the pods may be scheduled.
See the [Kubernetes docs](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
for more about Pod-to-Node assignment.

#### image
The Ray container images specified in the `RayCluster` CR should carry
the same Ray version as the CR's `spec.rayVersion`.
If you are using a nightly or development Ray image, it is fine to specify Ray's
latest release version under `spec.rayVersion`.

Code dependencies for a given Ray task or actor must be installed on each Ray node that
might run the task or actor.
To achieve this, it is simplest to use the same Ray image for the Ray head and all worker groups.
In any case, do make sure that all Ray images in your CR carry the same Ray version and
Python version.
To distribute custom code dependencies across your cluster, you can build a custom container image,
using one of the [official Ray images](https://hub.docker.com/r/rayproject/ray>) as the base.
See {ref}`this guide<docker-images>` to learn more about the official Ray images.
For dynamic dependency management geared towards iteration and developement,
you can also use {ref}`Runtime Environments<runtime-environments>`.

(rayStartParams)=
## Ray Start Parameters
The ``rayStartParams`` field of each group spec is a string-string map of arguments to the Ray
container’s `ray start` entrypoint. For the full list of arguments, refer to
the documentation for {ref}`ray start<ray-start-doc>`. We make special note of the following arguments:

### block
For most use-cases, this field should be set to "true" for all Ray pod. The container's Ray
entrypoint will then block forever until a Ray process exits, at which point the container
will exit. If this field is omitted, `ray start` will start Ray processes in the background and the container
will subsequently sleep forever until terminated. (Future versions of KubeRay may set
block to true by default. See [KubeRay issue #368](https://github.com/ray-project/kuberay/issues/368).)

### dashboard-host
For most use-cases, this field should be set to "0.0.0.0" for the Ray head pod.
This is required to expose the Ray dashboard outside the Ray cluster. (Future versions might set
this parameter by default.)

### num-cpus
This optional field tells the Ray scheduler and autoscaler how many CPUs are
available to the Ray pod. The CPU count can be autodetected from the
Kubernetes resource limits specified in the group spec’s pod
`template`. However, it is sometimes useful to override this autodetected
value. For example, setting `num-cpus:"0"` for the Ray head pod will prevent Ray
workloads with non-zero CPU requirements from being scheduled on the head.
Note that the values of all Ray start parameters, including `num-cpus`,
must be supplied as **strings**.

### num-gpus
This optional field specifies the number of GPUs available to the Ray container.
In KubeRay versions since 0.3.0, the number of GPUs can be auto-detected from Ray container resource limits.
For certain advanced use-cases, you may wish to use `num-gpus` to set an {ref}`override<kuberay-gpu-override>`.
Note that the values of all Ray start parameters, including `num-gpus`,
must be supplied as **strings**.

### memory
The memory available to the Ray is detected automatically from the Kubernetes resource
limits. If you wish, you may override this autodetected value by setting the desired memory value,
in bytes, under `rayStartParams.memory`.
Note that the values of all Ray start parameters, including `memory`,
must be supplied as **strings**.

### resources
This field can be used to specify custom resource capacities for the Ray pod.
These resource capacities will be advertised to the Ray scheduler and Ray autoscaler.
For example, the following annotation will mark a Ray pod as having 1 unit of `Custom1` capacity
and 5 units of `Custom2` capacity.
```yaml
rayStartParams:
    resources: '"{\"Custom1\": 1, \"Custom2\": 5}"'
```
You can then annotate tasks and actors with annotations like `@ray.remote(resources={"Custom2": 1})`.
The Ray scheduler and autoscaler will take appropriate action to schedule such tasks.

Note the format used to express the resources string. In particular, note
that the backslashes are present as actual characters in the string.
If you are specifying a `RayCluster` programmatically, you may have to
[escape the backslashes](https://github.com/ray-project/ray/blob/cd9cabcadf1607bcda1512d647d382728055e688/python/ray/tests/kuberay/test_autoscaling_e2e.py#L92) to make sure they are processed as part of the string.

The field `rayStartParams.resources` should only be used for custom resources. The keys
`CPU`, `GPU`, and `memory` are forbidden. If you need to specify overrides for those resource
fields, use the Ray start parameters `num-cpus`, `num-gpus`, or `memory`.

(kuberay-networking)=
## Services and Networking
### The Ray head service.
The KubeRay operator automatically configures a Kubernetes Service exposing the default ports
for several services of the Ray head pod, including
- Ray Client (default port 10001)
- Ray Dashboard (default port 8265)
- Ray GCS server (default port 6379)
- Ray Serve (default port 8000)

The name of the configured Kubernetes Service is the name, `metadata.name`, of the RayCluster
followed by the suffix <nobr>`head-svc`</nobr>. For the example CR given on this page, the name of
the head service will be
<nobr>`raycluster-example-head-svc`</nobr>. Kubernetes networking (`kube-dns`) then allows us to address
the Ray head's services using the name <nobr>`raycluster-example-head-svc`</nobr>.
For example, the Ray Client server can be accessed from a pod
in the same Kubernetes namespace using
```python
ray.init("ray://raycluster-example-head-svc:10001")
```
The Ray Client server can be accessed from a pod in another namespace using
```python
ray.init("ray://raycluster-example-head-svc.default.svc.cluster.local:10001")
```
(This assumes the Ray cluster was deployed into the default Kuberentes namespace.
If the Ray cluster is deployed in a non-default namespace, use that namespace in
place of `default`.)

### ServiceType, Ingresses
Ray Client and other services can be exposed outside the Kubernetes cluster
using port-forwarding or an ingress.
The simplest way to access the Ray head's services is to use port-forwarding.

Other means of exposing the head's services outside the cluster may require using
a service of type LoadBalancer or NodePort. Set `headGroupSpec.serviceType`
to the appropriate type for your application.

You may wish to set up an ingress to expose the Ray head's services outside the cluster.
If you set the optional boolean field `headGroupSpec.enableIngress` to `true`,
the KubeRay operator will create an ingress for your Ray cluster. See the [KubeRay documentation][IngressDoc]
for details. However, it is up to you to set up an ingress controller.
Moreover, the ingress created by the KubeRay operator [might not be compatible][IngressIssue] with your network setup.
It is valid to omit the `headGroupSpec.enableIngress` field and configure an ingress object yourself.


### Specifying non-default ports.
If you wish to override the ports exposed by the Ray head service, you may do so by specifying
the Ray head container's `ports` list, under `headGroupSpec`.
Here is an example of a list of non-default ports for the Ray head service.
```yaml
ports:
- containerPort: 6380
  name: gcs
- containerPort: 8266
  name: dashboard
- containerPort: 10002
  name: client
```
If the head container's `ports` list is specified, the Ray head service will expose precisely
the ports in the list. In the above example, the head service will expose just three ports;
in particular there will be no port exposed for Ray Serve.

For the Ray head to actually use the non-default ports specified in the ports list,
you must also specify the relevant `rayStartParams`. For the above example,
```yaml
rayStartParams:
  port: "6380"
  dashboard-port: "8266"
  ray-client-server-port: "10002"
  ...
```


(kuberay-autoscaling-config)=
## Autoscaler Configuration
```{note}
If you are deciding whether to use autoscaling for a particular Ray application,
check out this {ref}`discussion<autoscaler-pro-con>`. Note that autoscaling is
supported only with Ray versions at least as new as Ray 1.11.0.
```
To enable the optional Ray Autoscaler support, set `enableInTreeAutoscaling:true`.
The KubeRay operator will then automatically configure an autoscaling sidecar container
for the Ray head pod. The autoscaler container collects resource metrics from the Ray cluster
and automatically adjusts the `replicas` field of each `workerGroupSpec` as needed to fulfill
the requirements of your Ray application.

Use the fields `minReplicas` and `maxReplicas` to constrain the number of `replicas` of an autoscaling
`workerGroup`. When deploying an autoscaling cluster, one typically sets `replicas` and `minReplicas`
to the same value.
The Ray autoscaler will then take over and modify the `replicas` field as needed by
the Ray application.

### Autoscaler operation
We describe how the autoscaler interacts with the `RayCluster` CR.

#### Scale up
The autoscaler scales worker pods up to accomodate the load of logical resources
from your Ray application. For example, suppose you submit a task requesting 2 GPUs:
```python
@ray.remote(num_gpus=2)
...
```
If your Ray cluster does not currently have any GPU worker pods, and if your configuration
specifies a worker type with at least 2 units of GPU capacity, a GPU pod will be
upscaled.

The autoscaler scales Ray worker pods up by editing the `replicas` field of the relevant `workerGroupSpec`.

#### Scale down
The autoscaler scales a worker pod down when the pod has not been using any logical resources
for a {ref}`set period of time<kuberay-idle-timeout>`. In this context, "resources" are the logical Ray resources
(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.

To scale down Ray pods of a given `workerGroup`, the autoscaler
adds the Ray pods' names to the relevant `workerGroupSpec`'s
`scaleStrategy.workersToDelete` list and decrements the `replicas` field.

#### Manually scaling
You may manually adjust a `RayCluster`'s scale by editing the `replicas` or `workersToDelete` fields.
(It is also possible to implement custom scaling logic that adjusts scale on your behalf.)
It is however, not recommended to manually edit `replicas` or `workersToDelete` for a `RayCluster` with
autoscaling enabled.

### autoscalerOptions
To enable Ray autoscaler support, it is enough to set `enableInTreeAutoscaling:true`.
Should you need to adjust autoscaling behavior or change the autoscaler container's configuration,
you can use the `RayCluster` CR's `autoscalerOptions` field. The `autoscalerOptions` field
carries the following subfields:

#### upscalingMode
The `upscalingMode` field can be used to control the rate of Ray pod upscaling.

UpscalingMode is `Conservative`, `Default`, or `Aggressive`.
- `Conservative`: Upscaling is rate-limited; the number of pending worker pods is at most the number
  of worker pods connected to the Ray cluster.
- `Default`: Upscaling is not rate-limited.
- `Aggressive`: An alias for Default; upscaling is not rate-limited.

You may wish to use `Conservative` upscaling if you plan to submit many short-lived tasks
to your Ray cluster. In this situation, `Default` upscaling may trigger the _thrashing_ behavior:
- The autoscaler sees resource demands from the submitted short-lived tasks.
- The autoscaler immediately creates Ray pods to accomodate the demand.
- By the time the additional Ray pods are provisioned, the tasks have already run to completion.
- The additional Ray pods are unused and scale down after a period of idleness.

Note, however, that it is generally not recommended to over-parallelize with Ray.
Since running a Ray task incurs scheduling overhead, it is usually preferable to use
a few long-running tasks over many short-running tasks. Ensuring that each task has
a non-trivial amount of work to do will also help prevent the autoscaler from over-provisioning
Ray pods.

(kuberay-idle-timeout)=
#### idleTimeoutSeconds
`idleTimeoutSeconds` is the number of seconds to wait before scaling down a worker pod
which is not using resources. In this context, "resources" are the logical Ray resources
(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.

`idleTimeoutSeconds` defaults to 60 seconds.

#### resources
The `resources` subfield of `autoscalerOptions` sets optional resource overrides
for the autoscaler sidecar container. These overrides
should be specified in the standard [container resource
spec format](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
The default values are as indicated below:
```
resources:
  limits:
    cpu: "500m"
    memory: "512Mi"
  requests:
    cpu: "500m"
    memory: "512Mi"
```
These defaults should be suitable for most use-cases.
However, we do recommend monitoring autoscaler container resource usage and adjusting as needed.

#### image and imagePullPolicy
The `image` subfield of `autoscalerOptions` optionally overrides the autoscaler container image.
If your `RayCluster`'s `spec.RayVersion` is at least `2.0.0`, the autoscaler will default to using
**the same image** as the Ray container. (Ray autoscaler code is bundled with the rest of Ray.)
For older Ray versions, the autoscaler will default to the image `rayproject/ray:2.0.0`.

The `imagePullPolicy` subfield of `autoscalerOptions` optionally overrides the autoscaler container's
image pull policy. The default is `Always`.

The `image` and `imagePullPolicy` overrides are provided primarily for the purposes of autoscaler testing and
development.

#### env and envFrom

The `env` and `envFrom` fields specify autoscaler container
environment variables, for debugging and development purposes.
These fields should be formatted following the
[Kuberentes API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#environment-variables)
for container environment variables.

(kuberay-config-miscellaneous)=
## Pod and container lifecyle: preStop hooks and initContainers
There are two pieces of pod configuration that should always be included
in the RayCluster CR. Future versions of KubeRay may configure these elements automatically.

## initContainer
It is required for the configuration of each `workerGroupSpec`'s pod template to include
the following block:
```yaml
initContainers:
- name: init-myservice
  image: busybox:1.28
  command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
```
This instructs the worker pod to wait for creation of the Ray head service. The worker's `ray start`
command will use this service to connect to the Ray head.

(It is not required to include this init container in the Ray head pod's configuration.)

## preStopHook
It is recommended for every Ray container's configuration
to include the following blocking block:
```yaml
lifecycle:
  preStop:
    exec:
      command: ["/bin/sh","-c","ray stop"]
```
To ensure graceful termination, `ray stop` is executed prior to the Ray pod's termination.

[IngressDoc]: https://ray-project.github.io/kuberay/guidance/ingress/
[IngressIssue]: https://github.com/ray-project/kuberay/issues/441
-												[KubeRay] Documentation structure and skeleton (#26589)

Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.
											
										
										
											2022-07-19 13:28:04 -07:00
+								(kuberay-config)=
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								# RayCluster Configuration
-												[KubeRay] Documentation structure and skeleton (#26589)

Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.
											
										
										
											2022-07-19 13:28:04 -07:00
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								This guide covers the key aspects of Ray cluster configuration on Kubernetes.
-												[KubeRay] Documentation structure and skeleton (#26589)

Adds outline and structure for new KubeRay-based Ray-on-Kubernetes docs.
											
										
										
											2022-07-19 13:28:04 -07:00
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								## Introduction
 								Deployments of Ray on Kubernetes follow the [operator pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). The key players are
 								- A [custom resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
 								    called a `RayCluster` describing the desired state of a Ray cluster.
 								- A [custom controller](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-controllers),
 								    the KubeRay operator, which manages Ray pods in order to match the `RayCluster`'s spec.
 								To deploy a Ray cluster, one creates a `RayCluster` custom resource (CR):
 								```shell
 								kubectl apply -f raycluster.yaml
 								```
 								This guide covers the salient features of `RayCluster` CR configuration.
 								For reference, here is a condensed example of a `RayCluster` CR in yaml format.
 								```yaml
 								apiVersion: ray.io/v1alpha1
 								kind: RayCluster
 								metadata:
 								  name: raycluster-complete
 								spec:
 								  rayVersion: "2.0.0"
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								  enableInTreeAutoscaling: true
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								  autoscalerOptions:
 								     ...
 								  headGroupSpec:
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								    serviceType: ClusterIP # Options are ClusterIP, NodePort, and LoadBalancer
 								    enableIngress: false # Optional
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								    rayStartParams:
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								      block: true
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								      dashboard-host: "0.0.0.0"
 								      ...
 								    template: # Pod template
 								        metadata: # Pod metadata
 								        spec: # Pod spec
 								            containers:
 								            - name: ray-head
 								              image: rayproject/ray-ml:2.0.0
 								              resources:
 								                limits:
 								                  cpu: 14
 								                  memory: 54Gi
 								                requests:
 								                  cpu: 14
 								                  memory: 54Gi
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								              # Keep this preStop hook in each Ray container config.
 								              lifecycle:
 								                preStop:
 								                  exec:
 								                    command: ["/bin/sh","-c","ray stop"]
 								              ports: # Optional service port overrides
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								              - containerPort: 6379
 								                name: gcs
 								              - containerPort: 8265
 								                name: dashboard
 								              - containerPort: 10001
 								                name: client
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								              - containerPort: 8000
 								                name: serve
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								                ...
 								  workerGroupSpecs:
 								  - groupName: small-group
 								    replicas: 1
 								    minReplicas: 1
 								    maxReplicas: 5
 								    rayStartParams:
 								        ...
 								    template: # Pod template
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								      spec:
 								        # Keep this initContainer in each workerGroup template.
 								        initContainers:
 								        - name: init-myservice
 								          image: busybox:1.28
 								          command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								        ...
 								  # Another workerGroup
 								  - groupName: medium-group
 								    ...
 								  # Yet another workerGroup, with access to special hardware perhaps.
 								  - groupName: gpu-group
 								    ...
 								```
 								The rest of this guide will discuss the `RayCluster` CR's config fields.
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								(kuberay-config-ray-version)=
 								## The Ray Version
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								The field `rayVersion` specifies the version of Ray used in the Ray cluster.
 								The `rayVersion` is used to fill default values for certain config fields.
 								The Ray container images specified in the RayCluster CR should carry
 								the same Ray version as the CR's `rayVersion`. If you are using a nightly or development
 								Ray image, it is fine to set `rayVersion` to the latest release version of Ray.
 								## Pod configuration: headGroupSpec and workerGroupSpecs
 								At a high level, a RayCluster is a collection of Kubernetes pods, similar to a Kubernetes Deployment or StatefulSet.
 								Just as with the Kubernetes built-ins, the key pieces of configuration are
 								* Pod specification
 								* Scale information (how many pods are desired)
 								The key difference between a Deployment and a `RayCluster` is that a `RayCluster` is
 								specialized for running Ray applications. A Ray cluster consists of
 								* One **head pod** which hosts global control processes for the Ray cluster.
 								  The head pod can also run Ray tasks and actors.
 								* Any number of **worker pods**, which run Ray tasks and actors.
 								  Workers come in **worker groups** of identically configured pods.
 								  For each worker group, we must specify **replicas**, the number of
 								  pods we want of that group.
 								The head pod’s configuration is
 								specified under `headGroupSpec`, while configuration for worker pods is
 								specified under `workerGroupSpecs`. There may be multiple worker groups,
 								each group with its own configuration. The `replicas` field
 								of a `workerGroupSpec` specifies the number of worker pods of that group to
 								keep in the cluster.
 								### Pod templates
 								The bulk of the configuration for a `headGroupSpec` or
 								`workerGroupSpec` goes in the `template` field. The `template` is a Kubernetes Pod
 								template which determines the configuration for the pods in the group.
 								Here are some of the subfields of the pod `template` to pay attention to:
 								#### resources
 								It’s important to specify container CPU and memory requests and limits for
 								each group spec. For GPU workloads, you may also wish to specify GPU
 								limits. For example, set `nvidia.com/gpu:2` if using an Nvidia GPU device plugin
 								and you wish to specify a pod with access to 2 GPUs.
 								See {ref}`this guide <kuberay-gpu>` for more details on GPU support.
 								It's ideal to size each Ray pod to take up the
 								entire Kubernetes node on which it is scheduled. In other words, it’s
 								best to run one large Ray pod per Kubernetes node.
 								In general, it is more efficient to use a few large Ray pods than many small ones.
 								The pattern of fewer large Ray pods has the following advantages:
 								- more efficient use of each Ray pod's shared memory object store
 								- reduced communication overhead between Ray pods
 								- reduced redundancy of per-pod Ray control structures such as Raylets
 								The CPU, GPU, and memory **limits** specified in the Ray container config
 								will be automatically advertised to Ray. These values will be used as
 								the logical resource capacities of Ray pods in the head or worker group.
 								Note that CPU quantities will be rounded up to the nearest integer
 								before being relayed to Ray.
 								The resource capacities advertised to Ray may be overridden in the {ref}`rayStartParams`.
 								On the other hand CPU, GPU, and memory **requests** will be ignored by Ray.
 								For this reason, it is best when possible to set resource requests equal to resource limits.
 								#### nodeSelector and tolerations
 								You can control the scheduling of worker groups' Ray pods by setting the `nodeSelector` and
 								`tolerations` fields of the pod spec. Specifically, these fields determine on which Kubernetes
 								nodes the pods may be scheduled.
 								See the [Kubernetes docs](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)
 								for more about Pod-to-Node assignment.
 								#### image
 								The Ray container images specified in the `RayCluster` CR should carry
 								the same Ray version as the CR's `spec.rayVersion`.
 								If you are using a nightly or development Ray image, it is fine to specify Ray's
 								latest release version under `spec.rayVersion`.
 								Code dependencies for a given Ray task or actor must be installed on each Ray node that
 								might run the task or actor.
 								To achieve this, it is simplest to use the same Ray image for the Ray head and all worker groups.
 								In any case, do make sure that all Ray images in your CR carry the same Ray version and
 								Python version.
 								To distribute custom code dependencies across your cluster, you can build a custom container image,
 								using one of the [official Ray images](https://hub.docker.com/r/rayproject/ray>) as the base.
 								See {ref}`this guide<docker-images>` to learn more about the official Ray images.
 								For dynamic dependency management geared towards iteration and developement,
 								you can also use {ref}`Runtime Environments<runtime-environments>`.
 								(rayStartParams)=
 								## Ray Start Parameters
 								The ``rayStartParams`` field of each group spec is a string-string map of arguments to the Ray
 								container’s `ray start` entrypoint. For the full list of arguments, refer to
 								the documentation for {ref}`ray start<ray-start-doc>`. We make special note of the following arguments:
 								### block
 								For most use-cases, this field should be set to "true" for all Ray pod. The container's Ray
 								entrypoint will then block forever until a Ray process exits, at which point the container
 								will exit. If this field is omitted, `ray start` will start Ray processes in the background and the container
 								will subsequently sleep forever until terminated. (Future versions of KubeRay may set
 								block to true by default. See [KubeRay issue #368](https://github.com/ray-project/kuberay/issues/368).)
 								### dashboard-host
 								For most use-cases, this field should be set to "0.0.0.0" for the Ray head pod.
 								This is required to expose the Ray dashboard outside the Ray cluster. (Future versions might set
 								this parameter by default.)
 								### num-cpus
 								This optional field tells the Ray scheduler and autoscaler how many CPUs are
 								available to the Ray pod. The CPU count can be autodetected from the
 								Kubernetes resource limits specified in the group spec’s pod
 								`template`. However, it is sometimes useful to override this autodetected
 								value. For example, setting `num-cpus:"0"` for the Ray head pod will prevent Ray
 								workloads with non-zero CPU requirements from being scheduled on the head.
 								Note that the values of all Ray start parameters, including `num-cpus`,
 								must be supplied as **strings**.
 								### num-gpus
 								This optional field specifies the number of GPUs available to the Ray container.
 								In KubeRay versions since 0.3.0, the number of GPUs can be auto-detected from Ray container resource limits.
 								For certain advanced use-cases, you may wish to use `num-gpus` to set an {ref}`override<kuberay-gpu-override>`.
 								Note that the values of all Ray start parameters, including `num-gpus`,
 								must be supplied as **strings**.
 								### memory
 								The memory available to the Ray is detected automatically from the Kubernetes resource
 								limits. If you wish, you may override this autodetected value by setting the desired memory value,
 								in bytes, under `rayStartParams.memory`.
 								Note that the values of all Ray start parameters, including `memory`,
 								must be supplied as **strings**.
 								### resources
 								This field can be used to specify custom resource capacities for the Ray pod.
 								These resource capacities will be advertised to the Ray scheduler and Ray autoscaler.
 								For example, the following annotation will mark a Ray pod as having 1 unit of `Custom1` capacity
 								and 5 units of `Custom2` capacity.
 								```yaml
 								rayStartParams:
 								    resources: '"{\"Custom1\": 1, \"Custom2\": 5}"'
 								```
 								You can then annotate tasks and actors with annotations like `@ray.remote(resources={"Custom2": 1})`.
 								The Ray scheduler and autoscaler will take appropriate action to schedule such tasks.
 								Note the format used to express the resources string. In particular, note
 								that the backslashes are present as actual characters in the string.
 								If you are specifying a `RayCluster` programmatically, you may have to
 								[escape the backslashes](https://github.com/ray-project/ray/blob/cd9cabcadf1607bcda1512d647d382728055e688/python/ray/tests/kuberay/test_autoscaling_e2e.py#L92) to make sure they are processed as part of the string.
 								The field `rayStartParams.resources` should only be used for custom resources. The keys
 								`CPU`, `GPU`, and `memory` are forbidden. If you need to specify overrides for those resource
 								fields, use the Ray start parameters `num-cpus`, `num-gpus`, or `memory`.
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								(kuberay-networking)=
 								## Services and Networking
 								### The Ray head service.
 								The KubeRay operator automatically configures a Kubernetes Service exposing the default ports
 								for several services of the Ray head pod, including
 								- Ray Client (default port 10001)
 								- Ray Dashboard (default port 8265)
 								- Ray GCS server (default port 6379)
 								- Ray Serve (default port 8000)
 								The name of the configured Kubernetes Service is the name, `metadata.name`, of the RayCluster
 								followed by the suffix <nobr>`head-svc`</nobr>. For the example CR given on this page, the name of
 								the head service will be
 								<nobr>`raycluster-example-head-svc`</nobr>. Kubernetes networking (`kube-dns`) then allows us to address
 								the Ray head's services using the name <nobr>`raycluster-example-head-svc`</nobr>.
 								For example, the Ray Client server can be accessed from a pod
 								in the same Kubernetes namespace using
 								```python
 								ray.init("ray://raycluster-example-head-svc:10001")
 								```
 								The Ray Client server can be accessed from a pod in another namespace using
 								```python
 								ray.init("ray://raycluster-example-head-svc.default.svc.cluster.local:10001")
 								```
 								(This assumes the Ray cluster was deployed into the default Kuberentes namespace.
 								If the Ray cluster is deployed in a non-default namespace, use that namespace in
 								place of `default`.)
 								### ServiceType, Ingresses
 								Ray Client and other services can be exposed outside the Kubernetes cluster
 								using port-forwarding or an ingress.
 								The simplest way to access the Ray head's services is to use port-forwarding.
 								Other means of exposing the head's services outside the cluster may require using
 								a service of type LoadBalancer or NodePort. Set `headGroupSpec.serviceType`
 								to the appropriate type for your application.
 								You may wish to set up an ingress to expose the Ray head's services outside the cluster.
 								If you set the optional boolean field `headGroupSpec.enableIngress` to `true`,
 								the KubeRay operator will create an ingress for your Ray cluster. See the [KubeRay documentation][IngressDoc]
 								for details. However, it is up to you to set up an ingress controller.
 								Moreover, the ingress created by the KubeRay operator [might not be compatible][IngressIssue] with your network setup.
 								It is valid to omit the `headGroupSpec.enableIngress` field and configure an ingress object yourself.
 								### Specifying non-default ports.
 								If you wish to override the ports exposed by the Ray head service, you may do so by specifying
 								the Ray head container's `ports` list, under `headGroupSpec`.
 								Here is an example of a list of non-default ports for the Ray head service.
 								```yaml
 								ports:
 								- containerPort: 6380
 								  name: gcs
 								- containerPort: 8266
 								  name: dashboard
 								- containerPort: 10002
 								  name: client
 								```
 								If the head container's `ports` list is specified, the Ray head service will expose precisely
 								the ports in the list. In the above example, the head service will expose just three ports;
 								in particular there will be no port exposed for Ray Serve.
 								For the Ray head to actually use the non-default ports specified in the ports list,
 								you must also specify the relevant `rayStartParams`. For the above example,
 								```yaml
 								rayStartParams:
 								  port: "6380"
 								  dashboard-port: "8266"
 								  ray-client-server-port: "10002"
 								  ...
 								```
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
 								(kuberay-autoscaling-config)=
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								## Autoscaler Configuration
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								```{note}
 								If you are deciding whether to use autoscaling for a particular Ray application,
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								check out this {ref}`discussion<autoscaler-pro-con>`. Note that autoscaling is
 								supported only with Ray versions at least as new as Ray 1.11.0.
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
+								```
 								To enable the optional Ray Autoscaler support, set `enableInTreeAutoscaling:true`.
 								The KubeRay operator will then automatically configure an autoscaling sidecar container
 								for the Ray head pod. The autoscaler container collects resource metrics from the Ray cluster
 								and automatically adjusts the `replicas` field of each `workerGroupSpec` as needed to fulfill
 								the requirements of your Ray application.
 								Use the fields `minReplicas` and `maxReplicas` to constrain the number of `replicas` of an autoscaling
 								`workerGroup`. When deploying an autoscaling cluster, one typically sets `replicas` and `minReplicas`
 								to the same value.
 								The Ray autoscaler will then take over and modify the `replicas` field as needed by
 								the Ray application.
 								### Autoscaler operation
 								We describe how the autoscaler interacts with the `RayCluster` CR.
 								#### Scale up
 								The autoscaler scales worker pods up to accomodate the load of logical resources
 								from your Ray application. For example, suppose you submit a task requesting 2 GPUs:
 								```python
 								@ray.remote(num_gpus=2)
 								...
 								```
 								If your Ray cluster does not currently have any GPU worker pods, and if your configuration
 								specifies a worker type with at least 2 units of GPU capacity, a GPU pod will be
 								upscaled.
 								The autoscaler scales Ray worker pods up by editing the `replicas` field of the relevant `workerGroupSpec`.
 								#### Scale down
 								The autoscaler scales a worker pod down when the pod has not been using any logical resources
 								for a {ref}`set period of time<kuberay-idle-timeout>`. In this context, "resources" are the logical Ray resources
 								(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
 								Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
+								To scale down Ray pods of a given `workerGroup`, the autoscaler
 								adds the Ray pods' names to the relevant `workerGroupSpec`'s
 								`scaleStrategy.workersToDelete` list and decrements the `replicas` field.
-												[docs] KubeRay config guide and autoscaling discussion (#27504)

This PR adds a guide on RayCluster configuration and a page of discussion about autoscaling.

    Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>

											
										
										
											2022-08-05 13:11:28 -07:00
 								#### Manually scaling
 								You may manually adjust a `RayCluster`'s scale by editing the `replicas` or `workersToDelete` fields.
 								(It is also possible to implement custom scaling logic that adjusts scale on your behalf.)
 								It is however, not recommended to manually edit `replicas` or `workersToDelete` for a `RayCluster` with
 								autoscaling enabled.
 								### autoscalerOptions
 								To enable Ray autoscaler support, it is enough to set `enableInTreeAutoscaling:true`.
 								Should you need to adjust autoscaling behavior or change the autoscaler container's configuration,
 								you can use the `RayCluster` CR's `autoscalerOptions` field. The `autoscalerOptions` field
 								carries the following subfields:
 								#### upscalingMode
 								The `upscalingMode` field can be used to control the rate of Ray pod upscaling.
 								UpscalingMode is `Conservative`, `Default`, or `Aggressive`.
 								- `Conservative`: Upscaling is rate-limited; the number of pending worker pods is at most the number
 								  of worker pods connected to the Ray cluster.
 								- `Default`: Upscaling is not rate-limited.
 								- `Aggressive`: An alias for Default; upscaling is not rate-limited.
 								You may wish to use `Conservative` upscaling if you plan to submit many short-lived tasks
 								to your Ray cluster. In this situation, `Default` upscaling may trigger the _thrashing_ behavior:
 								- The autoscaler sees resource demands from the submitted short-lived tasks.
 								- The autoscaler immediately creates Ray pods to accomodate the demand.
 								- By the time the additional Ray pods are provisioned, the tasks have already run to completion.
 								- The additional Ray pods are unused and scale down after a period of idleness.
 								Note, however, that it is generally not recommended to over-parallelize with Ray.
 								Since running a Ray task incurs scheduling overhead, it is usually preferable to use
 								a few long-running tasks over many short-running tasks. Ensuring that each task has
 								a non-trivial amount of work to do will also help prevent the autoscaler from over-provisioning
 								Ray pods.
 								(kuberay-idle-timeout)=
 								#### idleTimeoutSeconds
 								`idleTimeoutSeconds` is the number of seconds to wait before scaling down a worker pod
 								which is not using resources. In this context, "resources" are the logical Ray resources
 								(such as CPU, GPU, memory, and custom resources) specified in Ray task and actor annotations.
 								Usage of the Ray Object Store also marks a Ray worker pod as active and prevents downscaling.
 								`idleTimeoutSeconds` defaults to 60 seconds.
 								#### resources
 								The `resources` subfield of `autoscalerOptions` sets optional resource overrides
 								for the autoscaler sidecar container. These overrides
 								should be specified in the standard [container resource
 								spec format](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources).
 								The default values are as indicated below:
 								```
 								resources:
 								  limits:
 								    cpu: "500m"
 								    memory: "512Mi"
 								  requests:
 								    cpu: "500m"
 								    memory: "512Mi"
 								```
 								These defaults should be suitable for most use-cases.
 								However, we do recommend monitoring autoscaler container resource usage and adjusting as needed.
 								#### image and imagePullPolicy
 								The `image` subfield of `autoscalerOptions` optionally overrides the autoscaler container image.
 								If your `RayCluster`'s `spec.RayVersion` is at least `2.0.0`, the autoscaler will default to using
 								**the same image** as the Ray container. (Ray autoscaler code is bundled with the rest of Ray.)
 								For older Ray versions, the autoscaler will default to the image `rayproject/ray:2.0.0`.
 								The `imagePullPolicy` subfield of `autoscalerOptions` optionally overrides the autoscaler container's
 								image pull policy. The default is `Always`.
 								The `image` and `imagePullPolicy` overrides are provided primarily for the purposes of autoscaler testing and
 								development.
 								#### env and envFrom
 								The `env` and `envFrom` fields specify autoscaler container
 								environment variables, for debugging and development purposes.
 								These fields should be formatted following the
 								[Kuberentes API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#environment-variables)
 								for container environment variables.
-												[kubernetes][docs] Logging guide, networking info, migration guide, fixes. (#27607)

This PR

Adds notes and example on logging for Ray/K8s.
Implements an API Reference paging pointing to the configuration guide and the RayCluster CR definition.
Takes managed K8s services out of the tabbed structure, to make that page look less sad.
Adds a comparison of the KubeRay operator and legacy K8s operator
Adds an architecture diagram for the autoscaling sections
Fixes some other minor items
Adds some info about networking to the configuration guide, removes the previously planned networking page

Signed-off-by: Dmitri Gekhtman <dmitri.m.gekhtman@gmail.com>
											
										
										
											2022-08-09 00:38:05 -07:00
 								(kuberay-config-miscellaneous)=
 								## Pod and container lifecyle: preStop hooks and initContainers
 								There are two pieces of pod configuration that should always be included
 								in the RayCluster CR. Future versions of KubeRay may configure these elements automatically.
 								## initContainer
 								It is required for the configuration of each `workerGroupSpec`'s pod template to include
 								the following block:
 								```yaml
 								initContainers:
 								- name: init-myservice
 								  image: busybox:1.28
 								  command: ['sh', '-c', "until nslookup $RAY_IP.$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace).svc.cluster.local; do echo waiting for myservice; sleep 2; done"]
 								```
 								This instructs the worker pod to wait for creation of the Ray head service. The worker's `ray start`
 								command will use this service to connect to the Ray head.
 								(It is not required to include this init container in the Ray head pod's configuration.)
 								## preStopHook
 								It is recommended for every Ray container's configuration
 								to include the following blocking block:
 								```yaml
 								lifecycle:
 								  preStop:
 								    exec:
 								      command: ["/bin/sh","-c","ray stop"]
 								```
 								To ensure graceful termination, `ray stop` is executed prior to the Ray pod's termination.
 								[IngressDoc]: https://ray-project.github.io/kuberay/guidance/ingress/
 								[IngressIssue]: https://github.com/ray-project/kuberay/issues/441