mirror of
https://github.com/vale981/ray
synced 2025-03-06 10:31:39 -05:00
80 lines
3.8 KiB
ReStructuredText
80 lines
3.8 KiB
ReStructuredText
.. _serve-faq:
|
|
|
|
Ray Serve FAQ
|
|
=============
|
|
|
|
This page answers some common questions about Ray Serve. If you have more
|
|
questions, feel free to ask them in the `Discussion Board <https://discuss.ray.io/>`_.
|
|
|
|
.. contents::
|
|
|
|
How do I deploy Ray Serve?
|
|
--------------------------
|
|
|
|
See :doc:`deployment` for information about how to deploy Serve.
|
|
|
|
|
|
How fast is Ray Serve?
|
|
----------------------
|
|
We are continuously benchmarking Ray Serve. We can confidently say:
|
|
|
|
- Ray Serve's **latency** overhead is single digit milliseconds, often times just 1-2 milliseconds.
|
|
- For **throughput**, Serve achieves about 3-4k qps on a single machine.
|
|
- It is **horizontally scalable** so you can add more machines to increase the overall throughput.
|
|
|
|
You can checkout our `microbenchmark instruction <https://github.com/ray-project/ray/tree/master/python/ray/serve/benchmarks>`_
|
|
to benchmark on your hardware.
|
|
|
|
|
|
Can I use ``asyncio`` along with Ray Serve?
|
|
-------------------------------------------
|
|
Yes! You can make your servable methods ``async def`` and Serve will run them
|
|
concurrently inside a Python asyncio event loop.
|
|
|
|
Are there any other similar frameworks?
|
|
---------------------------------------
|
|
Yes and no. We truly believe Serve is unique as it gives you end to end control
|
|
over the API while delivering scalability and high performance. To achieve
|
|
something like what Serve offers, you often need to glue together multiple
|
|
frameworks like Tensorflow Serving, SageMaker, or even roll your own
|
|
batching server.
|
|
|
|
How does Serve compare to TFServing, TorchServe, ONNXRuntime, and others?
|
|
-------------------------------------------------------------------------
|
|
Ray Serve is *framework agnostic*, you can use any Python framework and libraries.
|
|
We believe data scientists are not bounded a particular machine learning framework.
|
|
They use the best tool available for the job.
|
|
|
|
Compared to these framework specific solution, Ray Serve doesn't perform any optimizations
|
|
to make your ML model run faster. However, you can still optimize the models yourself
|
|
and run them in Ray Serve: for example, you can run a model compiled by
|
|
`PyTorch JIT <https://pytorch.org/docs/stable/jit.html>`_.
|
|
|
|
How does Serve compare to AWS SageMaker, Azure ML, Google AI Platform?
|
|
----------------------------------------------------------------------
|
|
Ray Serve brings the scalability and parallelism of these hosted offering to
|
|
your own infrastructure. You can use our :ref:`cluster launcher <cluster-cloud>`
|
|
to deploy Ray Serve to all major public clouds, K8s, as well as on bare-metal, on-premise machines.
|
|
|
|
Compared to these offerings, Ray Serve lacks a unified user interface and functionality
|
|
let you manage the lifecycle of the models, visualize it's performance, etc. Ray
|
|
Serve focuses on just model serving and provides the primitives for you to
|
|
build your own ML platform on top.
|
|
|
|
How does Serve compare to Seldon, KFServing, Cortex?
|
|
----------------------------------------------------
|
|
You can develop Ray Serve on your laptop, deploy it on a dev box, and scale it out
|
|
to multiple machines or K8s cluster without changing one lines of code. It's a lot
|
|
easier to get started with when you don't need to provision and manage K8s cluster.
|
|
When it's time to deploy, you can use Ray :ref:`cluster launcher <cluster-cloud>`
|
|
to transparently put your Ray Serve application in K8s.
|
|
|
|
Compare to these frameworks letting you deploy ML models on K8s, Ray Serve lacks
|
|
the ability to declaratively configure your ML application via YAML files. In
|
|
Ray Serve, you configure everything by Python code.
|
|
|
|
Is Ray Serve only for ML models?
|
|
--------------------------------
|
|
Nope! Ray Serve can be used to build any type of Python microservices
|
|
application. You can also use the full power of Ray within your Ray Serve
|
|
programs, so it's easy to run parallel computations within your deployments.
|