mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
107 lines
5.4 KiB
ReStructuredText
107 lines
5.4 KiB
ReStructuredText
.. _rayserve:
|
|
|
|
========================================
|
|
Serve: Scalable and Programmable Serving
|
|
========================================
|
|
|
|
.. tip::
|
|
Get in touch with us if you're using or considering using `Ray Serve <https://docs.google.com/forms/d/1l8HT35jXMPtxVUtQPeGoe09VGp5jcvSv0TqPgyz6lGU>`_.
|
|
|
|
|
|
|
|
.. image:: logo.svg
|
|
:align: center
|
|
:height: 250px
|
|
:width: 400px
|
|
|
|
.. _rayserve-overview:
|
|
|
|
Ray Serve is an easy-to-use scalable model serving library built on Ray. Ray Serve is:
|
|
|
|
- **Framework-agnostic**: Use a single toolkit to serve everything from deep learning models
|
|
built with frameworks like :ref:`PyTorch <serve-pytorch-tutorial>`,
|
|
:ref:`Tensorflow, and Keras <serve-tensorflow-tutorial>`, to :ref:`Scikit-Learn <serve-sklearn-tutorial>` models, to arbitrary Python business logic.
|
|
- **Python-first**: Configure your model serving declaratively in pure Python, without needing YAML or JSON configs.
|
|
|
|
Ray Serve enables :ref:`seamless multi-models inference pipeline (also known as model composition) <serve-pipeline-api>`. You can
|
|
write your inference pipeline all in code and integrate business logic with ML.
|
|
|
|
Since Ray Serve is built on Ray, it allows you to easily scale to many machines, both in your datacenter and in the cloud.
|
|
|
|
Ray Serve can be used in two primary ways to deploy your models at scale:
|
|
|
|
1. Have Python functions and classes automatically placed behind HTTP endpoints.
|
|
|
|
2. Alternatively, call them from :ref:`within your existing Python web server <serve-web-server-integration-tutorial>` using the Python-native :ref:`servehandle-api`.
|
|
|
|
.. note::
|
|
Serve recently added an experimental first-class API for model composition (pipelines).
|
|
Please take a look at the :ref:`Pipeline API <serve-pipeline-api>` and try it out!
|
|
|
|
.. tip::
|
|
Chat with Ray Serve users and developers on our `forum <https://discuss.ray.io/>`_!
|
|
|
|
Ray Serve Quickstart
|
|
====================
|
|
|
|
Ray Serve supports Python versions 3.6 through 3.8. To install Ray Serve, run the following command:
|
|
|
|
.. code-block:: bash
|
|
|
|
pip install "ray[serve]"
|
|
|
|
Now you can serve a function...
|
|
|
|
.. literalinclude:: ../../../python/ray/serve/examples/doc/quickstart_function.py
|
|
|
|
|
|
...or serve a stateful class.
|
|
|
|
.. literalinclude:: ../../../python/ray/serve/examples/doc/quickstart_class.py
|
|
|
|
|
|
See :doc:`core-apis` for more exhaustive coverage about Ray Serve and its core concept of a ``Deployment``.
|
|
For a high-level view of the architecture underlying Ray Serve, see :doc:`architecture`.
|
|
|
|
Why Ray Serve?
|
|
==============
|
|
|
|
There are generally two ways of serving machine learning applications, both with serious limitations:
|
|
you can use a **traditional web server**---your own Flask app---or you can use a cloud-hosted solution.
|
|
|
|
The first approach is easy to get started with, but it's hard to scale each component. The second approach
|
|
requires vendor lock-in (SageMaker), framework-specific tooling (TFServing), and a general
|
|
lack of flexibility.
|
|
|
|
Ray Serve solves these problems by giving you a simple web server (and the ability to :ref:`use your own <serve-web-server-integration-tutorial>`) while still handling the complex routing, scaling, and testing logic
|
|
necessary for production deployments.
|
|
|
|
Beyond scaling up your deployments with multiple replicas, Ray Serve also enables:
|
|
|
|
- :ref:`serve-model-composition`---ability to flexibly compose multiple models and independently scale and update each.
|
|
- :ref:`serve-batching`---built in request batching to help you meet your performance objectives.
|
|
- :ref:`serve-cpus-gpus`---specify fractional resource requirements to fully saturate each of your GPUs with several models.
|
|
|
|
For more on the motivation behind Ray Serve, check out these `meetup slides <https://tinyurl.com/serve-meetup>`_ and this `blog post <https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f>`_.
|
|
|
|
When should I use Ray Serve?
|
|
----------------------------
|
|
|
|
Ray Serve is a flexible tool that's easy to use for deploying, operating, and monitoring Python-based machine learning applications.
|
|
Ray Serve excels when you want to mix business logic with ML models and scaling out in production is a necessity. This might be because of large-scale batch processing
|
|
requirements or because you want to scale up a model pipeline consisting of many individual models with different performance properties.
|
|
|
|
If you plan on running on multiple machines, Ray Serve will serve you well!
|
|
|
|
What's next?
|
|
============
|
|
|
|
Check out the :doc:`tutorial` and :doc:`core-apis`, look at the :ref:`serve-faq`,
|
|
or head over to the :doc:`tutorials/index` to get started building your Ray Serve applications.
|
|
|
|
For more, see the following blog posts about Ray Serve:
|
|
|
|
- `Serving ML Models in Production: Common Patterns <https://www.anyscale.com/blog/serving-ml-models-in-production-common-patterns>`_ by Simon Mo, Edward Oakes, and Michael Galarnyk
|
|
- `How to Scale Up Your FastAPI Application Using Ray Serve <https://medium.com/distributed-computing-with-ray/how-to-scale-up-your-fastapi-application-using-ray-serve-c9a7b69e786>`_ by Archit Kulkarni
|
|
- `Machine Learning is Broken <https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f>`_ by Simon Mo
|
|
- `The Simplest Way to Serve your NLP Model in Production with Pure Python <https://medium.com/distributed-computing-with-ray/the-simplest-way-to-serve-your-nlp-model-in-production-with-pure-python-d42b6a97ad55>`_ by Edward Oakes and Bill Chambers
|