[Serve] [Doc] Create top-level page for Calling Endpoints from HTTP and from Python (#14904)

This commit is contained in:
architkulkarni 2021-03-24 18:29:24 -07:00 committed by GitHub
parent 2e9b065260
commit 03afaed6e1
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
9 changed files with 273 additions and 241 deletions

View file

@ -260,10 +260,10 @@ Papers
serve/index.rst
serve/tutorial.rst
serve/core-apis.rst
serve/http-servehandle.rst
serve/deployment.rst
serve/ml-models.rst
serve/advanced-traffic.rst
serve/advanced.rst
serve/performance.rst
serve/architecture.rst
serve/tutorials/index.rst

View file

@ -92,7 +92,7 @@ The shard key can either be specified via the X-SERVE-SHARD-KEY HTTP header or :
# Specifying the shard key via an HTTP header.
requests.get("127.0.0.1:8000/api", headers={"X-SERVE-SHARD-KEY": session_id})
# Specifying the shard key in a call made via serve handle.
# Specifying the shard key in a call made via ServeHandle.
handle = serve.get_handle("api_endpoint")
handler.options(shard_key=session_id).remote(args)

View file

@ -1,81 +0,0 @@
======================================
Advanced Topics and Configurations
======================================
Ray Serve has a number of knobs and tools for you to tune for your particular workload.
All Ray Serve advanced options and topics are covered on this page aside from the
fundamentals of :doc:`deployment`. For a more hands-on take, please check out the :ref:`serve-tutorials`.
There are a number of things you'll likely want to do with your serving application including
scaling out, splitting traffic, or batching input for better performance. To do all of this,
you will create a ``BackendConfig``, a configuration object that you'll use to set
the properties of a particular backend.
.. _serve-sync-async-handles:
Sync and Async Handles
======================
Ray Serve offers two types of ``ServeHandle``. You can use the ``serve.get_handle(..., sync=True|False)``
flag to toggle between them.
- When you set ``sync=True`` (the default), a synchronous handle is returned.
Calling ``handle.remote()`` should return a Ray ObjectRef.
- When you set ``sync=False``, an asyncio based handle is returned. You need to
Call it with ``await handle.remote()`` to return a Ray ObjectRef. To use ``await``,
you have to run ``serve.get_handle`` and ``handle.remote`` in Python asyncio event loop.
The async handle has performance advantage because it uses asyncio directly; as compared
to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
the reasoning behind these, checkout our `architecture documentation <./architecture.html>`_.
Configuring HTTP Server Locations
=================================
By default, Ray Serve starts only one HTTP on the head node of the Ray cluster.
You can configure this behavior using the ``http_options={"location": ...}`` flag
in :mod:`serve.start <ray.serve.start>`:
- "HeadOnly": start one HTTP server on the head node. Serve
assumes the head node is the node you executed serve.start
on. This is the default.
- "EveryNode": start one HTTP server per node.
- "NoServer" or ``None``: disable HTTP server.
.. note::
Using the "EveryNode" option, you can point a cloud load balancer to the
instance group of Ray cluster to achieve high availability of Serve's HTTP
proxies.
Variable HTTP Routes
====================
Ray Serve supports capturing path parameters. For example, in a call of the form
.. code-block:: python
serve.create_endpoint("my_endpoint", backend="my_backend", route="/api/{username}")
the ``username`` parameter will be accessible in your backend code as follows:
.. code-block:: python
def my_backend(request):
username = request.path_params["username"]
...
Ray Serve uses Starlette's Router class under the hood for routing, so type
conversion for path parameters is also supported, as well as multiple path parameters.
For example, suppose this route is used:
.. code-block:: python
serve.create_endpoint(
"complex", backend="f", route="/api/{user_id:int}/{number:float}")
Then for a query to the route ``/api/123/3.14``, the ``request.path_params`` dictionary
available in the backend will be ``{"user_id": 123, "number": 3.14}``, where ``123`` is
a Python int and ``3.14`` is a Python float.
For full details on the supported path parameters, see Starlette's
`path parameters documentation <https://www.starlette.io/routing/#path-parameters>`_.

View file

@ -2,9 +2,9 @@
Deploying Ray Serve
===================
In the :doc:`core-apis`, you saw some of the basics of how to write serve applications.
In the :doc:`core-apis`, you saw some of the basics of how to write Serve applications.
This section will dive deeper into how Ray Serve runs on a Ray cluster and how you're able
to deploy and update your serve application over time.
to deploy and update your Serve application over time.
.. contents:: Deploying Ray Serve

View file

@ -8,157 +8,11 @@ questions, feel free to ask them in the `Discussion Board <https://discuss.ray.i
.. contents::
How do I deploy serve?
----------------------
How do I deploy Ray Serve?
--------------------------
See :doc:`deployment` for information about how to deploy serve.
See :doc:`deployment` for information about how to deploy Serve.
How do I call an endpoint from Python code?
-------------------------------------------
Use :mod:`serve.get_handle <ray.serve.api.get_handle>` to get a handle to the endpoint,
then use :mod:`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests to that
endpoint. This returns a Ray ObjectRef whose result can be waited for or retrieved using
``ray.wait`` or ``ray.get``, respectively.
.. code-block:: python
handle = serve.get_handle("api_endpoint")
ray.get(handle.remote(request))
How do I call a method on my replica besides __call__?
------------------------------------------------------
To call a method via HTTP use the header field ``X-SERVE-CALL-METHOD``.
To call a method via Python, use :mod:`handle.options <ray.serve.handle.RayServeHandle.options>`:
.. code-block:: python
class StatefulProcessor:
def __init__(self):
self.count = 1
def __call__(self, request):
return {"current": self.count}
def other_method(self, inc):
self.count += inc
return True
handle = serve.get_handle("endpoint_name")
handle.options(method_name="other_method").remote(5)
The call is the same as a regular query except a different method is called
within the replica.
How do I use custom status codes in my response?
---------------------------------------------------------
You can return a `Starlette Response object <https://www.starlette.io/responses/>`_ from your backend code:
.. code-block:: python
from starlette.responses import Response
def f(starlette_request):
return Response('Hello, world!', status_code=123, media_type='text/plain')
serve.create_backend("hello", f)
How do I enable CORS and other HTTP features?
---------------------------------------------
Serve supports arbitrary `Starlette middlewares <https://www.starlette.io/middleware/>`_
and custom middlewares in Starlette format. The example below shows how to enable
`Cross-Origin Resource Sharing (CORS) <https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS>`_.
You can follow the same pattern for other Starlette middlewares.
.. code-block:: python
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
client = serve.start(
http_options={"middlewares": [
Middleware(
CORSMiddleware, allow_origins=["*"], allow_methods=["*"])
]})
.. _serve-handle-explainer:
How do ``ServeHandle`` and ``ServeRequest`` work?
---------------------------------------------------
Ray Serve enables you to query models both from HTTP and Python. This feature
enables seamless :ref:`model composition<serve-model-composition>`. You can
get a ``ServeHandle`` corresponding to an ``endpoint``, similar how you can
reach an endpoint through HTTP via a specific route. When you issue a request
to an endpoint through ``ServeHandle``, the request goes through the same code
path as an HTTP request would: choosing backends through :ref:`traffic
policies <serve-split-traffic>` and load balancing across available replicas.
When the request arrives in the model, you can access the data similarly to how
you would with HTTP request. Here are some examples how ServeRequest mirrors Starlette.Request:
.. list-table::
:header-rows: 1
* - HTTP
- ServeHandle
- | Request
| (Starlette.Request and ServeRequest)
* - ``requests.get(..., headers={...})``
- ``handle.options(http_headers={...})``
- ``request.headers``
* - ``requests.post(...)``
- ``handle.options(http_method="POST")``
- ``request.method``
* - ``requests.get(..., json={...})``
- ``handle.remote({...})``
- ``await request.json()``
* - ``requests.get(..., form={...})``
- ``handle.remote({...})``
- ``await request.form()``
* - ``requests.get(..., params={"a":"b"})``
- ``handle.remote(a="b")``
- ``request.query_params``
* - ``requests.get(..., data="long string")``
- ``handle.remote("long string")``
- ``await request.body()``
* - ``N/A``
- ``handle.remote(python_object)``
- ``request.data``
.. note::
You might have noticed that the last row of the table shows that ServeRequest supports
Python object pass through the handle. This is not possible in HTTP. If you
need to distinguish if the origin of the request is from Python or HTTP, you can do an ``isinstance``
check:
.. code-block:: python
import starlette.requests
if isinstance(request, starlette.requests.Request):
print("Request coming from web!")
elif isinstance(request, ServeRequest):
print("Request coming from Python!")
.. note::
Once special case is when you pass a web request to a handle.
.. code-block:: python
handle.remote(starlette_request)
In this case, Serve will `not` wrap it in ServeRequest. You can directly
process the request as a ``starlette.requests.Request``.
How fast is Ray Serve?
----------------------
@ -172,8 +26,8 @@ You can checkout our `microbenchmark instruction <https://github.com/ray-project
to benchmark on your hardware.
Can I use asyncio along with Ray Serve?
---------------------------------------
Can I use ``asyncio`` along with Ray Serve?
-------------------------------------------
Yes! You can make your servable methods ``async def`` and Serve will run them
concurrently inside a Python asyncio event loop.

View file

@ -0,0 +1,255 @@
==========================================
Calling Endpoints via HTTP and ServeHandle
==========================================
.. contents:: Calling Endpoints via HTTP and ServeHandle
Overview
========
Ray Serve endpoints can be called in two ways: from HTTP and from Python.
On this page we will show you both of these approaches and then give a tutorial
on how to integrate Ray Serve with an existing web server.
Calling Endpoints via HTTP
==========================
As described in the :doc:`tutorial`, when you create a Ray Serve endpoint, to
serve it over HTTP you just need to specify the ``route`` parameter to ``serve.create_endpoint``:
.. code-block:: python
serve.create_endpoint("my_endpoint", backend="my_backend", route="/counter")
Below, we discuss some advanced features for customizing Ray Serve's HTTP functionality:
Configuring HTTP Server Locations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default, Ray Serve starts a single HTTP server on the head node of the Ray cluster.
You can configure this behavior using the ``http_options={"location": ...}`` flag
in :mod:`serve.start <ray.serve.start>`:
- "HeadOnly": start one HTTP server on the head node. Serve
assumes the head node is the node you executed serve.start
on. This is the default.
- "EveryNode": start one HTTP server per node.
- "NoServer" or ``None``: disable HTTP server.
.. note::
Using the "EveryNode" option, you can point a cloud load balancer to the
instance group of Ray cluster to achieve high availability of Serve's HTTP
proxies.
Variable HTTP Routes
^^^^^^^^^^^^^^^^^^^^
Ray Serve supports capturing path parameters. For example, in a call of the form
.. code-block:: python
serve.create_endpoint("my_endpoint", backend="my_backend", route="/api/{username}")
the ``username`` parameter will be accessible in your backend code as follows:
.. code-block:: python
def my_backend(request):
username = request.path_params["username"]
...
Ray Serve uses Starlette's Router class under the hood for routing, so type
conversion for path parameters is also supported, as well as multiple path parameters.
For example, suppose this route is used:
.. code-block:: python
serve.create_endpoint(
"complex", backend="f", route="/api/{user_id:int}/{number:float}")
Then for a query to the route ``/api/123/3.14``, the ``request.path_params`` dictionary
available in the backend will be ``{"user_id": 123, "number": 3.14}``, where ``123`` is
a Python int and ``3.14`` is a Python float.
For full details on the supported path parameters, see Starlette's
`path parameters documentation <https://www.starlette.io/routing/#path-parameters>`_.
Custom HTTP response status codes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can return a `Starlette Response object <https://www.starlette.io/responses/>`_ from your Ray Serve backend code:
.. code-block:: python
from starlette.responses import Response
def f(starlette_request):
return Response('Hello, world!', status_code=123, media_type='text/plain')
serve.create_backend("hello", f)
Enabling CORS and other HTTP middlewares
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Serve supports arbitrary `Starlette middlewares <https://www.starlette.io/middleware/>`_
and custom middlewares in Starlette format. The example below shows how to enable
`Cross-Origin Resource Sharing (CORS) <https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS>`_.
You can follow the same pattern for other Starlette middlewares.
.. code-block:: python
from starlette.middleware import Middleware
from starlette.middleware.cors import CORSMiddleware
client = serve.start(
http_options={"middlewares": [
Middleware(
CORSMiddleware, allow_origins=["*"], allow_methods=["*"])
]})
.. _serve-handle-explainer:
ServeHandle: Calling Endpoints from Python
================================================
Ray Serve enables you to query models both from HTTP and Python. This feature
enables seamless :ref:`model composition<serve-model-composition>`. You can
get a ``ServeHandle`` corresponding to an ``endpoint``, similar how you can
reach an endpoint through HTTP via a specific route. When you issue a request
to an endpoint through ``ServeHandle``, the request goes through the same code
path as an HTTP request would: choosing backends through :ref:`traffic
policies <serve-split-traffic>` and load balancing across available replicas.
To call a Ray Serve endpoint from python, use :mod:`serve.get_handle <ray.serve.api.get_handle>`
to get a handle to the endpoint, then use
:mod:`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests to that
endpoint. This returns a Ray ObjectRef whose result can be waited for or retrieved using
``ray.wait`` or ``ray.get``, respectively.
.. code-block:: python
handle = serve.get_handle("api_endpoint")
ray.get(handle.remote(request))
Accessing data from the request
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When the request arrives in the model, you can access the data similarly to how
you would with an HTTP request. Here are some examples how Ray Serve's built-in
``ServeRequest`` mirrors ```starlette.requests.request``:
.. list-table::
:header-rows: 1
* - HTTP
- ServeHandle
- | Request
| (Starlette.Request and ServeRequest)
* - ``requests.get(..., headers={...})``
- ``handle.options(http_headers={...})``
- ``request.headers``
* - ``requests.post(...)``
- ``handle.options(http_method="POST")``
- ``request.method``
* - ``requests.get(..., json={...})``
- ``handle.remote({...})``
- ``await request.json()``
* - ``requests.get(..., form={...})``
- ``handle.remote({...})``
- ``await request.form()``
* - ``requests.get(..., params={"a":"b"})``
- ``handle.remote(a="b")``
- ``request.query_params``
* - ``requests.get(..., data="long string")``
- ``handle.remote("long string")``
- ``await request.body()``
* - ``N/A``
- ``handle.remote(python_object)``
- ``request.data``
.. note::
You might have noticed that the last row of the table shows that ``ServeRequest`` supports
passing Python objects through the handle. This is not possible in HTTP. If you
need to distinguish if the origin of the request is from Python or HTTP, you can do an ``isinstance``
check:
.. code-block:: python
import starlette.requests
if isinstance(request, starlette.requests.Request):
print("Request coming from web!")
elif isinstance(request, ServeRequest):
print("Request coming from Python!")
.. note::
One special case is when you pass a web request to a handle.
.. code-block:: python
handle.remote(starlette_request)
In this case, Serve will `not` wrap it in ServeRequest. You can directly
process the request as a ``starlette.requests.Request``.
.. _serve-sync-async-handles:
Sync and Async Handles
^^^^^^^^^^^^^^^^^^^^^^
Ray Serve offers two types of ``ServeHandle``. You can use the ``serve.get_handle(..., sync=True|False)``
flag to toggle between them.
- When you set ``sync=True`` (the default), a synchronous handle is returned.
Calling ``handle.remote()`` should return a Ray ObjectRef.
- When you set ``sync=False``, an asyncio based handle is returned. You need to
Call it with ``await handle.remote()`` to return a Ray ObjectRef. To use ``await``,
you have to run ``serve.get_handle`` and ``handle.remote`` in Python asyncio event loop.
The async handle has performance advantage because it uses asyncio directly; as compared
to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
the reasoning behind these, checkout our `architecture documentation <./architecture.html>`_.
.. _serve-custom-methods:
Calling methods on a Serve backend besides ``__call__``
=======================================================
By default, Ray Serve will serve the user-defined ``__call__`` method of your class, but
other methods of your class can be served as well.
To call a custom method via HTTP, pass in the method name in the header field ``X-SERVE-CALL-METHOD``.
To call a custom method via Python, use :mod:`handle.options <ray.serve.handle.RayServeHandle.options>`:
.. code-block:: python
class StatefulProcessor:
def __init__(self):
self.count = 1
def __call__(self, request):
return {"current": self.count}
def other_method(self, inc):
self.count += inc
return True
handle = serve.get_handle("endpoint_name")
handle.options(method_name="other_method").remote(5)
The call is the same as a regular query except a different method is called
within the replica.
Integrating with existing web servers
=====================================
Ray Serve comes with its own HTTP server out of the box, but if you have an existing
web application, you can still plug in Ray Serve to scale up your backend computation.
Using ``ServeHandle`` makes this easy.
For a tutorial with sample code, see :ref:`serve-web-server-integration-tutorial`.

View file

@ -54,9 +54,13 @@ For our Counter class to work with Ray Serve, it needs to be a *callable* class,
self.count += 1
return {"count": self.count}
.. note::
.. tip::
In addition to callable classes, you can also serve functions using Ray Serve.
You can also serve :ref:`other class methods<serve-custom-methods>` besides ``__call__``.
.. note::
Besides classes, you can also serve standalone functions with Ray Serve in the same way.
Now we are ready to deploy our class using Ray Serve. First, create a Ray Serve backend and pass in the Counter class:

View file

@ -594,7 +594,7 @@ class Client:
"You are retrieving a sync handle inside an asyncio loop. "
"Try getting client.get_handle(.., sync=False) to get better "
"performance. Learn more at https://docs.ray.io/en/master/"
"serve/advanced.html#sync-and-async-handles")
"serve/http-servehandle.html#sync-and-async-handles")
if not asyncio.get_event_loop().is_running() and not sync:
logger.warning(
@ -602,7 +602,7 @@ class Client:
"You should make sure client.get_handle is called inside a "
"running event loop. Or call client.get_handle(.., sync=True) "
"to create sync handle. Learn more at https://docs.ray.io/en/"
"master/serve/advanced.html#sync-and-async-handles")
"master/serve/http-servehandle.html#sync-and-async-handles")
if endpoint_name in all_endpoints:
this_endpoint = all_endpoints[endpoint_name]

View file

@ -37,7 +37,7 @@ scipy==1.4.1
tabulate
tensorboardX
uvicorn
pydantic
pydantic>=1.8
dataclasses; python_version < '3.7'
starlette