mirror of
https://github.com/vale981/ray
synced 2025-03-08 19:41:38 -05:00
112 lines
4 KiB
ReStructuredText
112 lines
4 KiB
ReStructuredText
============
|
|
Key Concepts
|
|
============
|
|
|
|
Ray Serve focuses on **simplicity** and only has two core concepts: endpoints and backends.
|
|
|
|
To follow along, you'll need to make the necessary imports.
|
|
|
|
.. code-block:: python
|
|
|
|
from ray import serve
|
|
serve.init() # Initializes Ray and Ray Serve.
|
|
|
|
.. _serve-endpoint:
|
|
|
|
Endpoints
|
|
=========
|
|
|
|
Endpoints allow you to name the "entity" that you'll be exposing,
|
|
the HTTP path that your application will expose.
|
|
Endpoints are "logical" and decoupled from the business logic or
|
|
model that you'll be serving. To create one, we'll simply specify the name, route, and methods.
|
|
|
|
.. code-block:: python
|
|
|
|
serve.create_endpoint("simple_endpoint", "/simple", methods=["GET"])
|
|
|
|
To view all of the existing endpoints that have created, use `serve.list_endpoints`.
|
|
|
|
.. code-block:: python
|
|
|
|
>>> serve.list_endpoints()
|
|
{'simple_endpoint': {'route': '/simple', 'methods': ['GET'], 'traffic': {}}}
|
|
|
|
You can also delete an endpoint using ``serve.delete_endpoint``.
|
|
Endpoints and backends are independent, so deleting an endpoint will not delete its backends.
|
|
However, an endpoint must be deleted in order to delete the backends that serve its traffic.
|
|
|
|
.. code-block:: python
|
|
|
|
serve.delete_endpoint("simple_endpoint")
|
|
|
|
.. _serve-backend:
|
|
|
|
Backends
|
|
========
|
|
|
|
Backends are the logical structures for your business logic or models and
|
|
how you specify what should happen when an endpoint is queried.
|
|
To define a backend, first you must define the "handler" or the business logic you'd like to respond with.
|
|
The input to this request will be a `Flask Request object <https://flask.palletsprojects.com/en/1.1.x/api/?highlight=request#flask.Request>`_.
|
|
Use a function when your response is stateless and a class when you
|
|
might need to maintain some state (like a model).
|
|
For both functions and classes (that take as input Flask Requests), you'll need to
|
|
define them as backends to Ray Serve.
|
|
You can specify arguments to be passed to class constructors in ``serve.create_backend``, shown below.
|
|
|
|
It's important to note that Ray Serve places these backends in individual worker processes, which are replicas of the model.
|
|
|
|
.. code-block:: python
|
|
|
|
def handle_request(flask_request):
|
|
return "hello world"
|
|
|
|
class RequestHandler:
|
|
# Take the message to return as an argument to the constructor.
|
|
def __init__(self, msg):
|
|
self.msg = msg
|
|
|
|
def __call__(self, flask_request):
|
|
return self.msg
|
|
|
|
serve.create_backend("simple_backend", handle_request)
|
|
# Pass in the message that the backend will return as an argument.
|
|
# If we call this backend, it will respond with "hello, world!".
|
|
serve.create_backend("simple_backend_class", RequestHandler, "hello, world!")
|
|
|
|
We can also list all available backends and delete them to reclaim resources.
|
|
Note that a backend cannot be deleted while it is in use by an endpoint because then traffic to an endpoint may not be able to be handled.
|
|
|
|
.. code-block:: python
|
|
|
|
>> serve.list_backends()
|
|
{
|
|
'simple_backend': {'accepts_batches': False, 'num_replicas': 1, 'max_batch_size': None},
|
|
'simple_backend_class': {'accepts_batches': False, 'num_replicas': 1, 'max_batch_size': None},
|
|
}
|
|
>> serve.delete_backend("simple_backend")
|
|
>> serve.list_backends()
|
|
{
|
|
'simple_backend_class': {'accepts_batches': False, 'num_replicas': 1, 'max_batch_size': None},
|
|
}
|
|
|
|
Setting Traffic
|
|
===============
|
|
|
|
Lastly, we need to route traffic the particular backend to the server endpoint.
|
|
To do that we'll use the ``set_traffic`` capability.
|
|
A link is essentially a load-balancer and allow you to define queuing policies
|
|
for how you would like backends to be served via an endpoint.
|
|
For instance, you can route 50% of traffic to Model A and 50% of traffic to Model B.
|
|
|
|
.. code-block:: python
|
|
|
|
serve.set_traffic("simple_backend", {"simple_endpoint": 1.0})
|
|
|
|
Once we've done that, we can now query our endpoint via HTTP (we use `requests` to make HTTP calls here).
|
|
|
|
.. code-block:: python
|
|
|
|
import requests
|
|
print(requests.get("http://127.0.0.1:8000/-/routes", timeout=0.5).text)
|