[Serve] [Doc] Create top-level page for Calling Endpoints from HTTP and from Python (#14904)

2025-03-05 18:11:42 -05:00 · 2021-03-24 18:29:24 -07:00 · 2021-03-24 18:29:24 -07:00 · 03afaed6e1
commit 03afaed6e1
parent 2e9b065260
9 changed files with 273 additions and 241 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -260,10 +260,10 @@ Papers
   serve/index.rst
   serve/tutorial.rst
   serve/core-apis.rst
+   serve/http-servehandle.rst
   serve/deployment.rst
   serve/ml-models.rst
   serve/advanced-traffic.rst
-   serve/advanced.rst
   serve/performance.rst
   serve/architecture.rst
   serve/tutorials/index.rst
--- a/doc/source/serve/advanced-traffic.rst
+++ b/doc/source/serve/advanced-traffic.rst
@ -92,7 +92,7 @@ The shard key can either be specified via the X-SERVE-SHARD-KEY HTTP header or :
  # Specifying the shard key via an HTTP header.
  requests.get("127.0.0.1:8000/api", headers={"X-SERVE-SHARD-KEY": session_id})

-  # Specifying the shard key in a call made via serve handle.
+  # Specifying the shard key in a call made via ServeHandle.
  handle = serve.get_handle("api_endpoint")
  handler.options(shard_key=session_id).remote(args)

--- a/doc/source/serve/advanced.rst
+++ b/doc/source/serve/advanced.rst
@ -1,81 +0,0 @@
-======================================
-Advanced Topics and Configurations
-======================================
-
-Ray Serve has a number of knobs and tools for you to tune for your particular workload.
-All Ray Serve advanced options and topics are covered on this page aside from the
-fundamentals of :doc:`deployment`. For a more hands-on take, please check out the :ref:`serve-tutorials`.
-
-There are a number of things you'll likely want to do with your serving application including
-scaling out, splitting traffic, or batching input for better performance. To do all of this,
-you will create a ``BackendConfig``, a configuration object that you'll use to set
-the properties of a particular backend.
-
-.. _serve-sync-async-handles:
-
-Sync and Async Handles
-======================
-
-Ray Serve offers two types of ``ServeHandle``. You can use the ``serve.get_handle(..., sync=True|False)``
-flag to toggle between them.
-
- When you set ``sync=True`` (the default), a synchronous handle is returned.
-  Calling ``handle.remote()`` should return a Ray ObjectRef.
- When you set ``sync=False``, an asyncio based handle is returned. You need to
-  Call it with ``await handle.remote()`` to return a Ray ObjectRef. To use ``await``,
-  you have to run ``serve.get_handle`` and ``handle.remote`` in Python asyncio event loop.
-
-The async handle has performance advantage because it uses asyncio directly; as compared
-to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
-the reasoning behind these, checkout our `architecture documentation <./architecture.html>`_.
-
-Configuring HTTP Server Locations
-=================================
-
-By default, Ray Serve starts only one HTTP on the head node of the Ray cluster.
-You can configure this behavior using the ``http_options={"location": ...}`` flag
-in :mod:`serve.start <ray.serve.start>`:
-
- "HeadOnly": start one HTTP server on the head node. Serve
-  assumes the head node is the node you executed serve.start
-  on. This is the default.
- "EveryNode": start one HTTP server per node.
- "NoServer" or ``None``: disable HTTP server.
-
-.. note::
-   Using the "EveryNode" option, you can point a cloud load balancer to the
-   instance group of Ray cluster to achieve high availability of Serve's HTTP
-   proxies.
-
-Variable HTTP Routes
-====================
-
-Ray Serve supports capturing path parameters.  For example, in a call of the form
-
-.. code-block:: python
-
-    serve.create_endpoint("my_endpoint", backend="my_backend", route="/api/{username}")
-
-the ``username`` parameter will be accessible in your backend code as follows:
-
-.. code-block:: python
-
-    def my_backend(request):
-        username = request.path_params["username"]
-        ...
-
-Ray Serve uses Starlette's Router class under the hood for routing, so type
-conversion for path parameters is also supported, as well as multiple path parameters.  
-For example, suppose this route is used:
-
-.. code-block:: python
-    
-    serve.create_endpoint(
-        "complex", backend="f", route="/api/{user_id:int}/{number:float}")
-
-Then for a query to the route ``/api/123/3.14``, the ``request.path_params`` dictionary 
-available in the backend will be ``{"user_id": 123, "number": 3.14}``, where ``123`` is
-a Python int and ``3.14`` is a Python float.
-
-For full details on the supported path parameters, see Starlette's
-`path parameters documentation <https://www.starlette.io/routing/#path-parameters>`_.
--- a/doc/source/serve/deployment.rst
+++ b/doc/source/serve/deployment.rst
@ -2,9 +2,9 @@
 Deploying Ray Serve
 ===================

-In the :doc:`core-apis`, you saw some of the basics of how to write serve applications.
+In the :doc:`core-apis`, you saw some of the basics of how to write Serve applications.
 This section will dive deeper into how Ray Serve runs on a Ray cluster and how you're able
-to deploy and update your serve application over time.
+to deploy and update your Serve application over time.

 .. contents:: Deploying Ray Serve

--- a/doc/source/serve/faq.rst
+++ b/doc/source/serve/faq.rst
@ -8,157 +8,11 @@ questions, feel free to ask them in the `Discussion Board <https://discuss.ray.i

 .. contents::

-How do I deploy serve?
----------------------
+How do I deploy Ray Serve?
+--------------------------

-See :doc:`deployment` for information about how to deploy serve.
+See :doc:`deployment` for information about how to deploy Serve.

-How do I call an endpoint from Python code?
-------------------------------------------
-
-Use :mod:`serve.get_handle <ray.serve.api.get_handle>` to get a handle to the endpoint,
-then use :mod:`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests to that
-endpoint. This returns a Ray ObjectRef whose result can be waited for or retrieved using
-``ray.wait`` or ``ray.get``, respectively.
-
-.. code-block:: python
-
-    handle = serve.get_handle("api_endpoint")
-    ray.get(handle.remote(request))
-
-
-How do I call a method on my replica besides __call__?
------------------------------------------------------
-
-To call a method via HTTP use the header field ``X-SERVE-CALL-METHOD``.
-
-To call a method via Python, use :mod:`handle.options <ray.serve.handle.RayServeHandle.options>`:
-
-.. code-block:: python
-
-    class StatefulProcessor:
-        def __init__(self):
-            self.count = 1
-
-        def __call__(self, request):
-            return {"current": self.count}
-
-        def other_method(self, inc):
-            self.count += inc
-            return True
-
-    handle = serve.get_handle("endpoint_name")
-    handle.options(method_name="other_method").remote(5)
-
-The call is the same as a regular query except a different method is called
-within the replica.
-
-How do I use custom status codes in my response?
---------------------------------------------------------
-
-You can return a `Starlette Response object <https://www.starlette.io/responses/>`_ from your backend code:
-
-.. code-block:: python
-
-    from starlette.responses import Response
-
-    def f(starlette_request):
-        return Response('Hello, world!', status_code=123, media_type='text/plain')
-    
-    serve.create_backend("hello", f)
-
-How do I enable CORS and other HTTP features?
---------------------------------------------
-
-Serve supports arbitrary `Starlette middlewares <https://www.starlette.io/middleware/>`_
-and custom middlewares in Starlette format. The example below shows how to enable
-`Cross-Origin Resource Sharing (CORS) <https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS>`_.
-You can follow the same pattern for other Starlette middlewares.
-
-
-.. code-block:: python
-
-    from starlette.middleware import Middleware
-    from starlette.middleware.cors import CORSMiddleware
-
-    client = serve.start(
-        http_options={"middlewares": [
-            Middleware(
-                CORSMiddleware, allow_origins=["*"], allow_methods=["*"])
-        ]})
-
-
-.. _serve-handle-explainer:
-
-How do ``ServeHandle`` and ``ServeRequest`` work?
---------------------------------------------------
-
-Ray Serve enables you to query models both from HTTP and Python. This feature
-enables seamless :ref:`model composition<serve-model-composition>`. You can
-get a ``ServeHandle`` corresponding to an ``endpoint``, similar how you can
-reach an endpoint through HTTP via a specific route. When you issue a request
-to an endpoint through ``ServeHandle``, the request goes through the same code
-path as an HTTP request would: choosing backends through :ref:`traffic
-policies <serve-split-traffic>` and load balancing across available replicas.
-
-When the request arrives in the model, you can access the data similarly to how
-you would with HTTP request. Here are some examples how ServeRequest mirrors Starlette.Request:
-
-.. list-table::
-   :header-rows: 1
-
-   * - HTTP
-     - ServeHandle
-     - | Request
-       | (Starlette.Request and ServeRequest)
-   * - ``requests.get(..., headers={...})``
-     - ``handle.options(http_headers={...})``
-     - ``request.headers``
-   * - ``requests.post(...)``
-     - ``handle.options(http_method="POST")``
-     - ``request.method``
-   * - ``requests.get(..., json={...})``
-     - ``handle.remote({...})``
-     - ``await request.json()``
-   * - ``requests.get(..., form={...})``
-     - ``handle.remote({...})``
-     - ``await request.form()``
-   * - ``requests.get(..., params={"a":"b"})``
-     - ``handle.remote(a="b")``
-     - ``request.query_params``
-   * - ``requests.get(..., data="long string")``
-     - ``handle.remote("long string")``
-     - ``await request.body()``
-   * - ``N/A``
-     - ``handle.remote(python_object)``
-     - ``request.data``
-
-.. note::
-
-    You might have noticed that the last row of the table shows that ServeRequest supports
-    Python object pass through the handle. This is not possible in HTTP. If you
-    need to distinguish if the origin of the request is from Python or HTTP, you can do an ``isinstance``
-    check:
-
-    .. code-block:: python
-
-        import starlette.requests
-
-        if isinstance(request, starlette.requests.Request):
-            print("Request coming from web!")
-        elif isinstance(request, ServeRequest):
-            print("Request coming from Python!")
-
-.. note::
-
-    Once special case is when you pass a web request to a handle.
-
-    .. code-block:: python
-
-        handle.remote(starlette_request)
-
-    In this case, Serve will `not` wrap it in ServeRequest. You can directly
-    process the request as a ``starlette.requests.Request``.

 How fast is Ray Serve?
 ----------------------
@ -172,8 +26,8 @@ You can checkout our `microbenchmark instruction <https://github.com/ray-project
 to benchmark on your hardware.


-Can I use asyncio along with Ray Serve?
---------------------------------------
+Can I use ``asyncio`` along with Ray Serve?
+-------------------------------------------
 Yes! You can make your servable methods ``async def`` and Serve will run them
 concurrently inside a Python asyncio event loop.

--- a/doc/source/serve/http-servehandle.rst
+++ b/doc/source/serve/http-servehandle.rst
@ -0,0 +1,255 @@
+==========================================
+Calling Endpoints via HTTP and ServeHandle
+==========================================
+
+.. contents:: Calling Endpoints via HTTP and ServeHandle
+
+Overview
+========
+
+Ray Serve endpoints can be called in two ways: from HTTP and from Python.
+On this page we will show you both of these approaches and then give a tutorial
+on how to integrate Ray Serve with an existing web server.
+
+Calling Endpoints via HTTP
+==========================
+
+As described in the :doc:`tutorial`, when you create a Ray Serve endpoint, to
+serve it over HTTP you just need to specify the ``route`` parameter to ``serve.create_endpoint``:
+
+.. code-block:: python
+
+    serve.create_endpoint("my_endpoint", backend="my_backend", route="/counter")
+
+Below, we discuss some advanced features for customizing Ray Serve's HTTP functionality:
+
+Configuring HTTP Server Locations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, Ray Serve starts a single HTTP server on the head node of the Ray cluster.
+You can configure this behavior using the ``http_options={"location": ...}`` flag
+in :mod:`serve.start <ray.serve.start>`:
+
+- "HeadOnly": start one HTTP server on the head node. Serve
+  assumes the head node is the node you executed serve.start
+  on. This is the default.
+- "EveryNode": start one HTTP server per node.
+- "NoServer" or ``None``: disable HTTP server.
+
+.. note::
+   Using the "EveryNode" option, you can point a cloud load balancer to the
+   instance group of Ray cluster to achieve high availability of Serve's HTTP
+   proxies.
+
+Variable HTTP Routes
+^^^^^^^^^^^^^^^^^^^^
+
+Ray Serve supports capturing path parameters.  For example, in a call of the form
+
+.. code-block:: python
+
+    serve.create_endpoint("my_endpoint", backend="my_backend", route="/api/{username}")
+
+the ``username`` parameter will be accessible in your backend code as follows:
+
+.. code-block:: python
+
+    def my_backend(request):
+        username = request.path_params["username"]
+        ...
+
+Ray Serve uses Starlette's Router class under the hood for routing, so type
+conversion for path parameters is also supported, as well as multiple path parameters.  
+For example, suppose this route is used:
+
+.. code-block:: python
+    
+    serve.create_endpoint(
+        "complex", backend="f", route="/api/{user_id:int}/{number:float}")
+
+Then for a query to the route ``/api/123/3.14``, the ``request.path_params`` dictionary 
+available in the backend will be ``{"user_id": 123, "number": 3.14}``, where ``123`` is
+a Python int and ``3.14`` is a Python float.
+
+For full details on the supported path parameters, see Starlette's
+`path parameters documentation <https://www.starlette.io/routing/#path-parameters>`_.
+
+Custom HTTP response status codes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can return a `Starlette Response object <https://www.starlette.io/responses/>`_ from your Ray Serve backend code:
+
+.. code-block:: python
+
+    from starlette.responses import Response
+
+    def f(starlette_request):
+        return Response('Hello, world!', status_code=123, media_type='text/plain')
+    
+    serve.create_backend("hello", f)
+
+Enabling CORS and other HTTP middlewares
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Serve supports arbitrary `Starlette middlewares <https://www.starlette.io/middleware/>`_
+and custom middlewares in Starlette format. The example below shows how to enable
+`Cross-Origin Resource Sharing (CORS) <https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS>`_.
+You can follow the same pattern for other Starlette middlewares.
+
+
+.. code-block:: python
+
+    from starlette.middleware import Middleware
+    from starlette.middleware.cors import CORSMiddleware
+
+    client = serve.start(
+        http_options={"middlewares": [
+            Middleware(
+                CORSMiddleware, allow_origins=["*"], allow_methods=["*"])
+        ]})
+
+.. _serve-handle-explainer:
+
+ServeHandle: Calling Endpoints from Python
+================================================
+
+Ray Serve enables you to query models both from HTTP and Python. This feature
+enables seamless :ref:`model composition<serve-model-composition>`. You can
+get a ``ServeHandle`` corresponding to an ``endpoint``, similar how you can
+reach an endpoint through HTTP via a specific route. When you issue a request
+to an endpoint through ``ServeHandle``, the request goes through the same code
+path as an HTTP request would: choosing backends through :ref:`traffic
+policies <serve-split-traffic>` and load balancing across available replicas.
+
+To call a Ray Serve endpoint from python, use :mod:`serve.get_handle <ray.serve.api.get_handle>` 
+to get a handle to the endpoint, then use 
+:mod:`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests to that
+endpoint. This returns a Ray ObjectRef whose result can be waited for or retrieved using
+``ray.wait`` or ``ray.get``, respectively.
+
+.. code-block:: python
+
+    handle = serve.get_handle("api_endpoint")
+    ray.get(handle.remote(request))
+
+
+Accessing data from the request
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When the request arrives in the model, you can access the data similarly to how
+you would with an HTTP request. Here are some examples how Ray Serve's built-in 
+``ServeRequest`` mirrors ```starlette.requests.request``:
+
+.. list-table::
+   :header-rows: 1
+
+   * - HTTP
+     - ServeHandle
+     - | Request
+       | (Starlette.Request and ServeRequest)
+   * - ``requests.get(..., headers={...})``
+     - ``handle.options(http_headers={...})``
+     - ``request.headers``
+   * - ``requests.post(...)``
+     - ``handle.options(http_method="POST")``
+     - ``request.method``
+   * - ``requests.get(..., json={...})``
+     - ``handle.remote({...})``
+     - ``await request.json()``
+   * - ``requests.get(..., form={...})``
+     - ``handle.remote({...})``
+     - ``await request.form()``
+   * - ``requests.get(..., params={"a":"b"})``
+     - ``handle.remote(a="b")``
+     - ``request.query_params``
+   * - ``requests.get(..., data="long string")``
+     - ``handle.remote("long string")``
+     - ``await request.body()``
+   * - ``N/A``
+     - ``handle.remote(python_object)``
+     - ``request.data``
+
+.. note::
+
+    You might have noticed that the last row of the table shows that ``ServeRequest`` supports
+    passing Python objects through the handle. This is not possible in HTTP. If you
+    need to distinguish if the origin of the request is from Python or HTTP, you can do an ``isinstance``
+    check:
+
+    .. code-block:: python
+
+        import starlette.requests
+
+        if isinstance(request, starlette.requests.Request):
+            print("Request coming from web!")
+        elif isinstance(request, ServeRequest):
+            print("Request coming from Python!")
+
+.. note::
+
+    One special case is when you pass a web request to a handle.
+
+    .. code-block:: python
+
+        handle.remote(starlette_request)
+
+    In this case, Serve will `not` wrap it in ServeRequest. You can directly
+    process the request as a ``starlette.requests.Request``.
+
+.. _serve-sync-async-handles:
+
+Sync and Async Handles
+^^^^^^^^^^^^^^^^^^^^^^
+
+Ray Serve offers two types of ``ServeHandle``. You can use the ``serve.get_handle(..., sync=True|False)``
+flag to toggle between them.
+
+- When you set ``sync=True`` (the default), a synchronous handle is returned.
+  Calling ``handle.remote()`` should return a Ray ObjectRef.
+- When you set ``sync=False``, an asyncio based handle is returned. You need to
+  Call it with ``await handle.remote()`` to return a Ray ObjectRef. To use ``await``,
+  you have to run ``serve.get_handle`` and ``handle.remote`` in Python asyncio event loop.
+
+The async handle has performance advantage because it uses asyncio directly; as compared
+to the sync handle, which talks to an asyncio event loop in a thread. To learn more about
+the reasoning behind these, checkout our `architecture documentation <./architecture.html>`_.
+
+.. _serve-custom-methods:
+
+Calling methods on a Serve backend besides ``__call__``
+=======================================================
+
+By default, Ray Serve will serve the user-defined ``__call__`` method of your class, but 
+other methods of your class can be served as well.
+
+To call a custom method via HTTP, pass in the method name in the header field ``X-SERVE-CALL-METHOD``.
+
+To call a custom method via Python, use :mod:`handle.options <ray.serve.handle.RayServeHandle.options>`:
+
+.. code-block:: python
+
+    class StatefulProcessor:
+        def __init__(self):
+            self.count = 1
+
+        def __call__(self, request):
+            return {"current": self.count}
+
+        def other_method(self, inc):
+            self.count += inc
+            return True
+
+    handle = serve.get_handle("endpoint_name")
+    handle.options(method_name="other_method").remote(5)
+
+The call is the same as a regular query except a different method is called
+within the replica.
+
+Integrating with existing web servers
+=====================================
+
+Ray Serve comes with its own HTTP server out of the box, but if you have an existing
+web application, you can still plug in Ray Serve to scale up your backend computation.
+
+Using ``ServeHandle`` makes this easy.  
+For a tutorial with sample code, see :ref:`serve-web-server-integration-tutorial`.
--- a/doc/source/serve/tutorial.rst
+++ b/doc/source/serve/tutorial.rst
@ -54,9 +54,13 @@ For our Counter class to work with Ray Serve, it needs to be a *callable* class,
        self.count += 1
        return {"count": self.count}

-.. note::
+.. tip::
  
-  In addition to callable classes, you can also serve functions using Ray Serve.
+  You can also serve :ref:`other class methods<serve-custom-methods>` besides ``__call__``.
+
+.. note::
+
+  Besides classes, you can also serve standalone functions with Ray Serve in the same way.

 Now we are ready to deploy our class using Ray Serve.  First, create a Ray Serve backend and pass in the Counter class:

--- a/python/ray/serve/api.py
+++ b/python/ray/serve/api.py
@ -594,7 +594,7 @@ class Client:
                "You are retrieving a sync handle inside an asyncio loop. "
                "Try getting client.get_handle(.., sync=False) to get better "
                "performance. Learn more at https://docs.ray.io/en/master/"
-                "serve/advanced.html#sync-and-async-handles")
+                "serve/http-servehandle.html#sync-and-async-handles")

        if not asyncio.get_event_loop().is_running() and not sync:
            logger.warning(
@ -602,7 +602,7 @@ class Client:
                "You should make sure client.get_handle is called inside a "
                "running event loop. Or call client.get_handle(.., sync=True) "
                "to create sync handle. Learn more at https://docs.ray.io/en/"
-                "master/serve/advanced.html#sync-and-async-handles")
+                "master/serve/http-servehandle.html#sync-and-async-handles")

        if endpoint_name in all_endpoints:
            this_endpoint = all_endpoints[endpoint_name]
--- a/python/requirements.txt
+++ b/python/requirements.txt
@ -37,7 +37,7 @@ scipy==1.4.1
 tabulate
 tensorboardX
 uvicorn
-pydantic
+pydantic>=1.8
 dataclasses; python_version < '3.7'
 starlette