Ray Serve is changing fast! You're probably running the latest pip release and not the nightly build, so please ensure you're viewing the correct version of this documentation.
`Here's the documentation for the latest pip release of Ray Serve <https://docs.ray.io/en/latest/serve/index.html>`_.
2. Alternatively, call them from :ref:`within your existing Python web server <serve-web-server-integration-tutorial>` using the Python-native :ref:`servehandle-api`.
Ray Serve solves these problems by giving you a simple web server (and the ability to :ref:`use your own <serve-web-server-integration-tutorial>`) while still handling the complex routing, scaling, and testing logic
Beyond scaling up your backends with multiple replicas, Ray Serve also enables:
-:ref:`serve-split-traffic` with zero downtime, by decoupling routing logic from response-handling logic.
-:ref:`serve-batching`---built in to help you meet your performance objectives.
-:ref:`serve-cpus-gpus`---specify fractional resource requirements to fully saturate each of your GPUs with several models.
For more on the motivation behind Ray Serve, check out these `meetup slides <https://tinyurl.com/serve-meetup>`_ and this `blog post <https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f>`_.
or head over to the :doc:`tutorials/index` to get started building your Ray Serve applications.
For more, see the following blog posts about Ray Serve:
-`How to Scale Up Your FastAPI Application Using Ray Serve <https://medium.com/distributed-computing-with-ray/how-to-scale-up-your-fastapi-application-using-ray-serve-c9a7b69e786>`_ by Archit Kulkarni
-`Machine Learning is Broken <https://medium.com/distributed-computing-with-ray/machine-learning-serving-is-broken-f59aff2d607f>`_ by Simon Mo
-`The Simplest Way to Serve your NLP Model in Production with Pure Python <https://medium.com/distributed-computing-with-ray/the-simplest-way-to-serve-your-nlp-model-in-production-with-pure-python-d42b6a97ad55>`_ by Edward Oakes and Bill Chambers