See :ref:`serve-key-concepts` for more information about working with RayServe.
Why RayServe?
~~~~~~~~~~~~~
There are generally two ways of serving machine learning applications, both with serious limitations:
you can build using a **traditional webserver** - your own Flask app or you can use a cloud hosted solution.
The first approach is easy to get started with, but it's hard to scale each component. The second approach
requires vendor lock-in (SageMaker), framework specific tooling (TFServing), and a general
lack of flexibility.
RayServe solves these problems by giving a user the ability to leverage the simplicity
of deployment of a simple webserver but handles the complex routing, scaling, and testing logic
necessary for production deployments.
For more on the motivation behind RayServe, check out these `meetup slides <https://tinyurl.com/serve-meetup>`_.
When should I use Ray Serve?
++++++++++++++++++++++++++++
RayServe should be used when you need to deploy at least one model, preferrably many models.
RayServe **won't work well** when you need to run batch prediction over a dataset. Given this use case, we recommend looking into `multiprocessing with Ray </multiprocessing.html>`_.
.._serve-key-concepts:
Key Concepts
------------
RayServe focuses on **simplicity** and only has two core concepts: endpoints and backends.
To follow along, you'll need to make the necessary imports.
..code-block:: python
from ray import serve
serve.init() # initializes serve and Ray
Endpoints
~~~~~~~~~
Endpoints allow you to name the "entity" that you'll be exposing,
the HTTP path that your application will expose.
Endpoints are "logical" and decoupled from the business logic or
model that you'll be serving. To create one, we'll simply specify the name, route, and methods.