While Ray Serve makes it easy to scale out on a multi-node Ray cluster, in some scenarios a single node may suite your needs.
There are two ways you can run Ray Serve on a single node, shown below.
In general, **Option 2 is recommended for most users** because it allows you to fully make use of Serve's ability to dynamically update running backends.
2. First running ``ray start --head`` on the machine, then connecting to the running local Ray cluster using ``ray.init(address="auto")`` in your Serve script(s). You can run multiple scripts to update your backends over time.
Here, we'll be using the `Kubernetes default config`_ with a few small modifications.
First, we need to make sure that the head node of the cluster, where Ray Serve will run its HTTP server, is exposed as a Kubernetes `Service`_.
There is already a default head node service defined in the ``services`` field of the config, so we just need to make sure that it's exposing the right port: 8000, which Ray Serve binds on by default.
..code-block:: yaml
# Service that maps to the head node of the Ray cluster.
- apiVersion: v1
kind: Service
metadata:
name: ray-head
spec:
# Must match the label in the head pod spec below.
selector:
component: ray-head
ports:
- protocol: TCP
# Port that this service will listen on.
port: 8000
# Port that requests will be sent to in pods backing the service.
targetPort: 8000
Then, we also need to make sure that the head node pod spec matches the selector defined here and exposes the same port:
..code-block:: yaml
head_node:
apiVersion: v1
kind: Pod
metadata:
# Automatically generates a name for the pod with this prefix.
generateName: ray-head-
# Matches the selector in the service definition above.
labels:
component: ray-head
spec:
# ...
containers:
- name: ray-node
# ...
ports:
- containerPort: 8000 # Ray Serve default port.
# ...
The rest of the config remains unchanged for this example, though you may want to change the container image or the number of worker pods started by default when running your own deployment.
Now, we just need to start the cluster:
..code-block:: shell
# Start the cluster.
$ ray up ray/python/ray/autoscaler/kubernetes/example-full.yaml
# Check the status of the service pointing to the head node. If configured
# properly, you should see the 'Endpoints' field populated with an IP
# address like below. If not, make sure the head node pod started
# successfully and the selector/labels match.
$ kubectl -n ray describe service ray-head
Name: ray-head
Namespace: ray
Labels: <none>
Annotations: <none>
Selector: component=ray-head
Type: ClusterIP
IP: 10.100.188.203
Port: <unset> 8000/TCP
TargetPort: 8000/TCP
Endpoints: 192.168.73.98:8000
Session Affinity: None
Events: <none>
With the cluster now running, we can run a simple script to start Ray Serve and deploy a "hello world" backend:
Here you can see the Serve controller actor, an HTTP proxy actor, and all of the replicas for each Serve backend in the deployment.
To learn about the function of the controller and proxy actors, see the `Serve Architecture page <architecture.html>`__.
In this example pictured above, we have a single-node cluster with a backend class called Counter with ``num_replicas=2`` in its :class:`~ray.serve.BackendConfig`.
Querying a Serve endpoint with the above backend will produce a log line like the following:
..code-block:: bash
(pid=42161) 2021-02-26 11:05:21,709 INFO snippet_logger.py:13 -- Some info! component=serve backend=my_backend replica=my_backend#jZlnUI
To write your own custom logger using Python's ``logging`` package, use the following method:
..autofunction:: ray.serve.get_replica_context
Ray Serve logs can be ingested by your favorite external logging agent. Ray logs from the current session are exported to the directory `/tmp/ray/session_latest/logs` and remain there until the next session starts.
Tutorial: Ray Serve with Loki
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is a quick walkthrough of how to explore and filter your logs using `Loki <https://grafana.com/oss/loki/>`__.
Setup and configuration is very easy on Kubernetes, but in this tutorial we'll just set things up manually.
First, install Loki and Promtail using the instructions on https://grafana.com.
It will be convenient to save the Loki and Promtail executables in the same directory, and to navigate to this directory in your terminal before beginning this walkthrough.
Now let's get our logs into Loki using Promtail.
Save the following file as ``promtail-local-config.yaml``:
..code-block:: yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: ray
static_configs:
- labels:
job: ray
__path__: /tmp/ray/session_latest/logs/*.*
The relevant part for Ray is the ``static_configs`` field, where we have indicated the location of our log files with ``__path__``.
The expression ``*.*`` will match all files, but not directories, which cause an error with Promtail.
We will run Loki locally. Grab the default config file for Loki with the following command in your terminal:
Here you may need to replace ``./loki-darwin-amd64`` with the path to your Loki executable file, which may have a different name depending on your operating system.
Start Promtail and pass in the path to the config file we saved earlier:
Now `install and run Grafana <https://grafana.com/docs/grafana/latest/installation/>`__ and navigate to ``http://localhost:3000``, where you can log in with the default username "admin" and default password "admin".
On the welcome page, click "Add your first data source" and click "Loki" to add Loki as a data source.
Now click "Explore" in the left-side panel. You are ready to run some queries!
To filter all these Ray logs for the ones relevant to our backend, use the following `LogQL <https://grafana.com/docs/loki/latest/logql/>`__ query:
..code-block:: shell
{job="ray"} |= "backend=my_backend"
You should see something similar to the following: