When you create a deployment, it is exposed over HTTP by default at `/{deployment_name}`. You can change the route by specifying the `route_prefix` argument to the {mod}`@serve.deployment <ray.serve.api.deployment>` decorator.
When you make a request to the Serve HTTP server at `/counter`, it will forward the request to the deployment's `__call__` method and provide a [Starlette Request object](https://www.starlette.io/requests/) as the sole argument. The `__call__` method can return any JSON-serializable object or a [Starlette Response object](https://www.starlette.io/responses/) (e.g., to return a custom status code).
Below, we discuss some advanced features for customizing Ray Serve's HTTP functionality.
(serve-fastapi-http)=
### FastAPI HTTP Deployments
If you want to define more complex HTTP handling logic, Serve integrates with [FastAPI](https://fastapi.tiangolo.com/). This allows you to define a Serve deployment using the {mod}`@serve.ingress <ray.serve.api.ingress>` decorator that wraps a FastAPI app with its full range of features. The most basic example of this is shown below, but for more details on all that FastAPI has to offer such as variable routes, automatic type validation, dependency injection (e.g., for database connections), and more, please check out [their documentation](https://fastapi.tiangolo.com/).
```python
import ray
from fastapi import FastAPI
from ray import serve
app = FastAPI()
ray.init(address="auto", namespace="summarizer")
serve.start(detached=True)
@serve.deployment(route_prefix="/hello")
@serve.ingress(app)
class MyFastAPIDeployment:
@app.get("/")
def root(self):
return "Hello, world!"
MyFastAPIDeployment.deploy()
```
Now if you send a request to `/hello`, this will be routed to the `root` method of our deployment. We can also easily leverage FastAPI to define multiple routes with different HTTP methods:
```python
import ray
from fastapi import FastAPI
from ray import serve
app = FastAPI()
ray.init(address="auto", namespace="summarizer")
serve.start(detached=True)
@serve.deployment(route_prefix="/hello")
@serve.ingress(app)
class MyFastAPIDeployment:
@app.get("/")
def root(self):
return "Hello, world!"
@app.post("/{subpath}")
def root(self, subpath: str):
return f"Hello from {subpath}!"
MyFastAPIDeployment.deploy()
```
You can also pass in an existing FastAPI app to a deployment to serve it as-is:
```python
import ray
from fastapi import FastAPI
from ray import serve
app = FastAPI()
ray.init(address="auto", namespace="summarizer")
serve.start(detached=True)
@app.get("/")
def f():
return "Hello from the root!"
# ... add more routes, routers, etc. to `app` ...
@serve.deployment(route_prefix="/")
@serve.ingress(app)
class FastAPIWrapper:
pass
FastAPIWrapper.deploy()
```
This is useful for scaling out an existing FastAPI app with no modifications necessary.
Existing middlewares, automatic OpenAPI documentation generation, and other advanced FastAPI features should work as-is.
You can also combine routes defined this way with routes defined on the deployment:
```python
import ray
from fastapi import FastAPI
from ray import serve
app = FastAPI()
ray.init(address="auto", namespace="summarizer")
serve.start(detached=True)
@app.get("/")
def f():
return "Hello from the root!"
@serve.deployment(route_prefix="/api1")
@serve.ingress(app)
class FastAPIWrapper1:
@app.get("/subpath")
def method(self):
return "Hello 1!"
@serve.deployment(route_prefix="/api2")
@serve.ingress(app)
class FastAPIWrapper2:
@app.get("/subpath")
def method(self):
return "Hello 2!"
FastAPIWrapper1.deploy()
FastAPIWrapper2.deploy()
```
In this example, requests to both `/api1` and `/api2` would return `Hello from the root!` while a request to `/api1/subpath` would return `Hello 1!` and a request to `/api2/subpath` would return `Hello 2!`.
To try it out, save a code snippet in a local python file (i.e. main.py) and in the same directory, run the following commands to start a local Ray cluster on your machine.
```bash
ray start --head
python main.py
```
(serve-http-adapters)=
### HTTP Adapters
HTTP adapters are functions that convert raw HTTP request to Python types that you know and recognize.
Its input arguments should be type annotated. At minimal, it should accept a `starlette.requests.Request` type.
But it can also accept any type that's recognized by the FastAPI's dependency injection framework.
For example, here is an adapter that extra the json content from request.
Ray Serve enables you to query models both from HTTP and Python. This feature
enables seamless [model composition](serve-model-composition). You can
get a `ServeHandle` corresponding to deployment, similar how you can
reach a deployment through HTTP via a specific route. When you issue a request
to a deployment through `ServeHandle`, the request is load balanced across
available replicas in the same way an HTTP request is.
To call a Ray Serve deployment from python, use {mod}`Deployment.get_handle <ray.serve.api.Deployment>`
to get a handle to the deployment, then use
{mod}`handle.remote <ray.serve.handle.RayServeHandle.remote>` to send requests
to that deployment. These requests can pass ordinary args and kwargs that are
passed directly to the method. This returns a Ray `ObjectRef` whose result
can be waited for or retrieved using `ray.wait` or `ray.get`.
```python
@serve.deployment
class Deployment:
def method1(self, arg):
return f"Method1: {arg}"
def __call__(self, arg):
return f"__call__: {arg}"
Deployment.deploy()
handle = Deployment.get_handle()
ray.get(handle.remote("hi")) # Defaults to calling the __call__ method.
ray.get(handle.method1.remote("hi")) # Call a different method.
```
If you want to use the same deployment to serve both HTTP and ServeHandle traffic, the recommended best practice is to define an internal method that the HTTP handling logic will call: