[serve] Replace "backend" with "deployment" in metrics & logging (#17434)

This commit is contained in:
Edward Oakes 2021-08-05 17:37:21 -05:00 committed by GitHub
parent 05b0da94b7
commit 839ceba6db
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
18 changed files with 92 additions and 99 deletions

View file

@ -37,14 +37,14 @@ When an HTTP request is sent to the router, the follow things happen:
- The HTTP request is received and parsed.
- The correct deployment associated with the HTTP url path is looked up. The
request is placed on a queue.
- For each request in a backend queue, an available replica is looked up
- For each request in a deployment queue, an available replica is looked up
and the request is sent to it. If there are no available replicas (there
are more than ``max_concurrent_queries`` requests outstanding), the request
is left in the queue until an outstanding request is finished.
Each replica maintains a queue of requests and executes one at a time, possibly
using asyncio to process them concurrently. If the handler (the function for the
backend or ``__call__``) is ``async``, the replica will not wait for the
deployment or ``__call__``) is ``async``, the replica will not wait for the
handler to run; otherwise, the replica will block until the handler returns.
FAQ
@ -59,7 +59,7 @@ replica will be able to continue to handle requests.
Machine errors and faults will be handled by Ray. Serve utilizes the :ref:`actor
reconstruction <actor-fault-tolerance>` capability. For example, when a machine hosting any of the
actors crashes, those actors will be automatically restarted on another
available machine. All data in the Controller (routing policies, backend
available machine. All data in the Controller (routing policies, deployment
configurations, etc) is checkpointed to the Ray. Transient data in the
router and the replica (like network connections and internal request
queues) will be lost upon failure.
@ -81,7 +81,7 @@ How do ServeHandles work?
:mod:`ServeHandles <ray.serve.handle.RayServeHandle>` wrap a handle to the router actor on the same node. When a
request is sent from one via replica to another via the handle, the
requests go through the same data path as incoming HTTP requests. This enables
the same backend selection and batching procedures to happen. ServeHandles are
the same deployment selection and batching procedures to happen. ServeHandles are
often used to implement :ref:`model composition <serve-model-composition>`.

View file

@ -28,7 +28,7 @@ Deploying on a Single Node
While Ray Serve makes it easy to scale out on a multi-node Ray cluster, in some scenarios a single node may suite your needs.
There are two ways you can run Ray Serve on a single node, shown below.
In general, **Option 2 is recommended for most users** because it allows you to fully make use of Serve's ability to dynamically update running backends.
In general, **Option 2 is recommended for most users** because it allows you to fully make use of Serve's ability to dynamically update running deployments.
1. Start Ray and deploy with Ray Serve all in a single Python file.
@ -157,7 +157,7 @@ Now, we just need to start the cluster:
Session Affinity: None
Events: <none>
With the cluster now running, we can run a simple script to start Ray Serve and deploy a "hello world" backend:
With the cluster now running, we can run a simple script to start Ray Serve and deploy a "hello world" deployment:
.. code-block:: python
@ -219,7 +219,7 @@ Below is an example of what the Ray Dashboard might look like for a Serve deploy
.. image:: https://raw.githubusercontent.com/ray-project/Images/master/docs/dashboard/serve-dashboard.png
:align: center
Here you can see the Serve controller actor, an HTTP proxy actor, and all of the replicas for each Serve backend in the deployment.
Here you can see the Serve controller actor, an HTTP proxy actor, and all of the replicas for each Serve deployment.
To learn about the function of the controller and proxy actors, see the `Serve Architecture page <architecture.html>`__.
In this example pictured above, we have a single-node cluster with a deployment named Counter with ``num_replicas=2``.
@ -235,18 +235,18 @@ Logging in Ray Serve uses Python's standard logging facility.
Tracing Backends and Replicas
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When looking through log files of your Ray Serve application, it is useful to know which backend and replica each log line originated from.
To automatically include the current backend tag and replica tag in your logs, simply call
``logger = logging.getLogger("ray")``, and use ``logger`` within your backend code:
When looking through log files of your Ray Serve application, it is useful to know which deployment and replica each log line originated from.
To automatically include the current deployment and replica in your logs, simply call
``logger = logging.getLogger("ray")``, and use ``logger`` within your deployment code:
.. literalinclude:: ../../../python/ray/serve/examples/doc/snippet_logger.py
:lines: 1, 9, 11-13, 15-16
Querying a Serve endpoint with the above backend will produce a log line like the following:
Querying a Serve endpoint with the above deployment will produce a log line like the following:
.. code-block:: bash
(pid=42161) 2021-02-26 11:05:21,709 INFO snippet_logger.py:13 -- Some info! component=serve backend=f replica=f#jZlnUI
(pid=42161) 2021-02-26 11:05:21,709 INFO snippet_logger.py:13 -- Some info! component=serve deployment=f replica=f#jZlnUI
To write your own custom logger using Python's ``logging`` package, use the following method:
@ -319,20 +319,20 @@ Now we are ready to start our Ray Serve deployment. Start a long-running Ray cl
ray start --head
serve start
Now run the following Python script to deploy a basic Serve backend with a Serve backend logger:
Now run the following Python script to deploy a basic Serve deployment with a Serve deployment logger:
.. literalinclude:: ../../../python/ray/serve/examples/doc/backend_logger.py
.. literalinclude:: ../../../python/ray/serve/examples/doc/deployment_logger.py
Now `install and run Grafana <https://grafana.com/docs/grafana/latest/installation/>`__ and navigate to ``http://localhost:3000``, where you can log in with the default username "admin" and default password "admin".
On the welcome page, click "Add your first data source" and click "Loki" to add Loki as a data source.
Now click "Explore" in the left-side panel. You are ready to run some queries!
To filter all these Ray logs for the ones relevant to our backend, use the following `LogQL <https://grafana.com/docs/loki/latest/logql/>`__ query:
To filter all these Ray logs for the ones relevant to our deployment, use the following `LogQL <https://grafana.com/docs/loki/latest/logql/>`__ query:
.. code-block:: shell
{job="ray"} |= "backend=Counter"
{job="ray"} |= "deployment=Counter"
You should see something similar to the following:
@ -353,18 +353,18 @@ The following metrics are exposed by Ray Serve:
* - Name
- Description
* - ``serve_backend_request_counter``
* - ``serve_deployment_request_counter``
- The number of queries that have been processed in this replica.
* - ``serve_backend_error_counter``
- The number of exceptions that have occurred in the backend.
* - ``serve_backend_replica_starts``
* - ``serve_deployment_error_counter``
- The number of exceptions that have occurred in the deployment.
* - ``serve_deployment_replica_starts``
- The number of times this replica has been restarted due to failure.
* - ``serve_backend_queuing_latency_ms``
* - ``serve_deployment_queuing_latency_ms``
- The latency for queries in the replica's queue waiting to be processed.
* - ``serve_backend_processing_latency_ms``
* - ``serve_deployment_processing_latency_ms``
- The latency for queries to be processed.
* - ``serve_replica_queued_queries``
- The current number of queries queued in the backend replicas.
- The current number of queries queued in the deployment replicas.
* - ``serve_replica_processing_queries``
- The current number of queries being processed.
* - ``serve_num_http_requests``
@ -373,8 +373,8 @@ The following metrics are exposed by Ray Serve:
- The number of requests processed by the router.
* - ``serve_handle_request_counter``
- The number of requests processed by this ServeHandle.
* - ``backend_queued_queries``
- The number of queries for this backend waiting to be assigned to a replica.
* - ``serve_deployment_queued_queries``
- The number of queries for this deployment waiting to be assigned to a replica.
To see this in action, run ``ray start --head --metrics-export-port=8080`` in your terminal, and then run the following script:
@ -386,12 +386,12 @@ The metrics are updated once every ten seconds, and you will need to refresh the
For example, after running the script for some time and refreshing ``localhost:8080`` you might see something that looks like::
ray_serve_backend_processing_latency_ms_count{...,backend="f",...} 99.0
ray_serve_backend_processing_latency_ms_sum{...,backend="f",...} 99279.30498123169
ray_serve_deployment_processing_latency_ms_count{...,deployment="f",...} 99.0
ray_serve_deployment_processing_latency_ms_sum{...,deployment="f",...} 99279.30498123169
which indicates that the average processing latency is just over one second, as expected.
You can even define a `custom metric <..ray-metrics.html#custom-metrics>`__ to use in your backend, and tag it with the current backend or replica.
You can even define a `custom metric <..ray-metrics.html#custom-metrics>`__ to use in your deployment, and tag it with the current deployment or replica.
Here's an example:
.. literalinclude:: ../../../python/ray/serve/examples/doc/snippet_custom_metric.py

View file

@ -77,4 +77,4 @@ Is Ray Serve only for ML models?
--------------------------------
Nope! Ray Serve can be used to build any type of Python microservices
application. You can also use the full power of Ray within your Ray Serve
programs, so it's easy to run parallel computations within your backends.
programs, so it's easy to run parallel computations within your deployments.

View file

@ -76,7 +76,7 @@ lack of flexibility.
Ray Serve solves these problems by giving you a simple web server (and the ability to :ref:`use your own <serve-web-server-integration-tutorial>`) while still handling the complex routing, scaling, and testing logic
necessary for production deployments.
Beyond scaling up your backends with multiple replicas, Ray Serve also enables:
Beyond scaling up your deployments with multiple replicas, Ray Serve also enables:
- :ref:`serve-model-composition`---ability to flexibly compose multiple models and independently scale and update each.
- :ref:`serve-batching`---built in request batching to help you meet your performance objectives.

View file

@ -51,7 +51,7 @@ stacking or ensembles.
To define a higher-level composed model you need to do three things:
1. Define your underlying models (the ones that you will compose together) as
Ray Serve backends
Ray Serve deployments.
2. Define your composed model, using the handles of the underlying models
(see the example below).
3. Define an endpoint representing this composed model and query it!

View file

@ -17,7 +17,7 @@ Performance and known benchmarks
We are continuously benchmarking Ray Serve. The metrics we care about are latency, throughput, and scalability. We can confidently say:
- Ray Serves latency overhead is single digit milliseconds, around 1-2 milliseconds on average.
- For throughput, Serve achieves about 3-4k queries per second on a single machine (8 cores) using 1 http proxy and 8 backend replicas performing noop requests.
- For throughput, Serve achieves about 3-4k queries per second on a single machine (8 cores) using 1 http proxy and 8 replicas performing noop requests.
- It is horizontally scalable so you can add more machines to increase the overall throughput. Ray Serve is built on top of Ray,
so its scalability is bounded by Rays scalability. Please check out Rays `scalability envelope <https://github.com/ray-project/ray/blob/master/benchmarks/README.md>`_
to learn more about the maximum number of nodes and other limitations.
@ -31,7 +31,7 @@ The performance issue you're most likely to encounter is high latency and/or low
If you have set up :ref:`monitoring <serve-monitoring>` with Ray and Ray Serve, you will likely observe that
``serve_num_router_requests`` is constant while your load increases
``serve_backend_queuing_latency_ms`` is spiking up as queries queue up in the background
``serve_deployment_queuing_latency_ms`` is spiking up as queries queue up in the background
Given the symptom, there are several ways to fix it.
@ -46,15 +46,14 @@ Async functions
Are you using ``async def`` in your callable? If you are using asyncio and
hitting the same queuing issue mentioned above, you might want to increase
``max_concurrent_queries``. Serve sets a low number by default so the client gets
proper backpressure. You can increase the value in the :mod:`backend config <ray.serve.config.BackendConfig>`
to allow more coroutines running in the same replica.
proper backpressure. You can increase the value in the Deployment decorator.
Batching
^^^^^^^^
If your backend can process a batch at a time at a sublinear latency
If your deployment can process a batch at a time at a sublinear latency
(for example, if it takes 1ms to process 1 query and 5ms to process 10 of them)
then batching is your best approach. Check out the :ref:`batching guide <serve-batching>` to
make your backend accept batches (especially for GPU-based ML inference). You might want to tune your ``max_batch_size`` and ``batch_wait_timeout`` in the ``@serve.batch`` decorator to maximize the benefits:
make your deployment accept batches (especially for GPU-based ML inference). You might want to tune your ``max_batch_size`` and ``batch_wait_timeout`` in the ``@serve.batch`` decorator to maximize the benefits:
- ``max_batch_size`` specifies how big the batch should be. Generally,
we recommend choosing the largest batch size your function can handle

View file

@ -6,7 +6,7 @@ Batching Tutorial
In this guide, we will deploy a simple vectorized adder that takes
a batch of queries and adds them at once. In particular, we show:
- How to implement and deploy a Ray Serve backend that accepts batches.
- How to implement and deploy a Ray Serve deployment that accepts batches.
- How to configure the batch size.
- How to query the model in Python.
@ -37,7 +37,7 @@ This function must also be ``async def`` so that you can handle multiple queries
async def my_batch_handler(self, requests: List):
pass
This batch handler can then be called from another ``async def`` method in your backend.
This batch handler can then be called from another ``async def`` method in your deployment.
These calls will be batched and executed together, but return an individual result as if
they were a normal function call:
@ -62,7 +62,7 @@ they were a normal function call:
``batch_wait_timeout_s`` option to ``@serve.batch`` (defaults to 0). Increasing this
timeout may improve throughput at the cost of latency under low load.
Let's define a backend that takes in a list of requests, extracts the input value,
Let's define a deployment that takes in a list of requests, extracts the input value,
converts them into an array, and uses NumPy to add 1 to each element.
.. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_batch.py
@ -90,7 +90,7 @@ What if you want to evaluate a whole batch in Python? Ray Serve allows you to se
queries via the Python API. A batch of queries can either come from the web server
or the Python API. Learn more :ref:`here<serve-handle-explainer>`.
To query the backend via the Python API, we can use ``Deployment.get_handle`` to receive
To query the deployment via the Python API, we can use ``Deployment.get_handle`` to receive
a handle to the corresponding deployment. To enqueue a query, you can call
``handle.method.remote(data)``. This call returns immediately
with a :ref:`Ray ObjectRef<ray-object-refs>`. You can call `ray.get` to retrieve

View file

@ -33,8 +33,7 @@ The ``__call__`` method will be invoked per request.
:end-before: __doc_define_servable_end__
Now that we've defined our services, let's deploy the model to Ray Serve. We will
define an endpoint for the route representing the digit classifier task, a
backend correspond the physical implementation, and connect them together.
define a Serve deployment that will be exposed over an HTTP route.
.. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_pytorch.py
:start-after: __doc_deploy_begin__

View file

@ -42,8 +42,7 @@ retrieves the ``request.json()["observation"]`` as input.
:end-before: __doc_define_servable_end__
Now that we've defined our services, let's deploy the model to Ray Serve. We will
define an endpoint for the route representing the ppo model, a
backend correspond the physical implementation, and connect them together.
define a Serve deployment that will be exposed over an HTTP route.
.. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_rllib.py
:start-after: __doc_deploy_begin__

View file

@ -37,8 +37,7 @@ The ``__call__`` method will be invoked per request.
:end-before: __doc_define_servable_end__
Now that we've defined our services, let's deploy the model to Ray Serve. We will
define an endpoint for the route representing the classifier task, a
backend correspond the physical implementation, and connect them together.
define a Serve deployment that will be exposed over an HTTP route.
.. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_sklearn.py
:start-after: __doc_deploy_begin__

View file

@ -40,8 +40,7 @@ The ``__call__`` method will be invoked per request.
:end-before: __doc_define_servable_end__
Now that we've defined our services, let's deploy the model to Ray Serve. We will
define an endpoint for the route representing the digit classifier task, a
backend correspond the physical implementation, and connect them together.
define a Serve deployment that will be exposed over an HTTP route.
.. literalinclude:: ../../../../python/ray/serve/examples/doc/tutorial_tensorflow.py
:start-after: __doc_deploy_begin__

View file

@ -26,7 +26,7 @@ Heres a simple FastAPI web server. It uses Huggingface Transformers to auto-g
.. literalinclude:: ../../../../python/ray/serve/examples/doc/fastapi/fastapi_simple.py
To scale this up, we define a Ray Serve backend containing our text model and call it from Python using a ServeHandle:
To scale this up, we define a Ray Serve deployment containing our text model and call it from Python:
.. literalinclude:: ../../../../python/ray/serve/examples/doc/fastapi/servehandle_fastapi.py
@ -52,7 +52,7 @@ The terminal should then print the generated text:
To clean up the Ray cluster, run ``ray stop`` in the terminal.
.. tip::
According to the backend configuration parameter ``num_replicas``, Ray Serve will place multiple replicas of your model across multiple CPU cores and multiple machines (provided you have :ref:`started a multi-node Ray cluster <cluster-index>`), which will correspondingly multiply your throughput.
According to the deployment configuration parameter ``num_replicas``, Ray Serve will place multiple replicas of your model across multiple CPU cores and multiple machines (provided you have :ref:`started a multi-node Ray cluster <cluster-index>`), which will correspondingly multiply your throughput.
Scaling Up an AIOHTTP Application
---------------------------------

View file

@ -25,7 +25,7 @@ from ray.serve.common import BackendInfo, GoalId
from ray.serve.config import (BackendConfig, HTTPOptions, ReplicaConfig)
from ray.serve.constants import (DEFAULT_HTTP_HOST, DEFAULT_HTTP_PORT,
HTTP_PROXY_TIMEOUT, SERVE_CONTROLLER_NAME)
from ray.serve.controller import BackendTag, ReplicaTag, ServeController
from ray.serve.controller import ReplicaTag, ServeController
from ray.serve.exceptions import RayServeException
from ray.serve.handle import RayServeHandle, RayServeSyncHandle
from ray.serve.http_util import (ASGIHTTPSender, make_fastapi_class_based_view)
@ -60,21 +60,21 @@ def _set_global_client(client):
@dataclass
class ReplicaContext:
"""Stores data for Serve API calls from within the user's backend code."""
backend_tag: BackendTag
deployment: str
replica_tag: ReplicaTag
_internal_controller_name: str
servable_object: Callable
def _set_internal_replica_context(
backend_tag: BackendTag,
deployment: str,
replica_tag: ReplicaTag,
controller_name: str,
servable_object: Callable,
):
global _INTERNAL_REPLICA_CONTEXT
_INTERNAL_REPLICA_CONTEXT = ReplicaContext(
backend_tag, replica_tag, controller_name, servable_object)
deployment, replica_tag, controller_name, servable_object)
def _ensure_connected(f: Callable) -> Callable:
@ -987,19 +987,17 @@ def get_handle(
@PublicAPI
def get_replica_context() -> ReplicaContext:
"""When called from a backend, returns the backend tag and replica tag.
When not called from a backend, returns None.
"""If called from a deployment, returns the deployment and replica tag.
A replica tag uniquely identifies a single replica for a Ray Serve
backend at runtime. Replica tags are of the form
`<backend tag>#<random letters>`.
deployment at runtime. Replica tags are of the form
`<deployment_name>#<random letters>`.
Raises:
RayServeException: if not called from within a Ray Serve backend
RayServeException: if not called from within a Ray Serve deployment.
Example:
>>> serve.get_replica_context().backend_tag # my_backend
>>> serve.get_replica_context().replica_tag # my_backend#krcwoa
>>> serve.get_replica_context().deployment # deployment_name
>>> serve.get_replica_context().replica_tag # deployment_name#krcwoa
"""
if _INTERNAL_REPLICA_CONTEXT is None:
raise RayServeException("`serve.get_replica_context()` "

View file

@ -131,7 +131,7 @@ class RayServeReplica:
def __init__(self, _callable: Callable, backend_config: BackendConfig,
is_function: bool, controller_handle: ActorHandle) -> None:
self.backend_tag = ray.serve.api.get_replica_context().backend_tag
self.backend_tag = ray.serve.api.get_replica_context().deployment
self.replica_tag = ray.serve.api.get_replica_context().replica_tag
self.callable = _callable
self.is_function = is_function
@ -141,11 +141,11 @@ class RayServeReplica:
self.num_ongoing_requests = 0
self.request_counter = metrics.Counter(
"serve_backend_request_counter",
"serve_deployment_request_counter",
description=("The number of queries that have been "
"processed in this replica."),
tag_keys=("backend", ))
self.request_counter.set_default_tags({"backend": self.backend_tag})
tag_keys=("deployment", ))
self.request_counter.set_default_tags({"deployment": self.backend_tag})
self.loop = asyncio.get_event_loop()
self.long_poll_client = LongPollClient(
@ -158,38 +158,38 @@ class RayServeReplica:
)
self.error_counter = metrics.Counter(
"serve_backend_error_counter",
"serve_deployment_error_counter",
description=("The number of exceptions that have "
"occurred in the backend."),
tag_keys=("backend", ))
self.error_counter.set_default_tags({"backend": self.backend_tag})
"occurred in the deployment."),
tag_keys=("deployment", ))
self.error_counter.set_default_tags({"deployment": self.backend_tag})
self.restart_counter = metrics.Counter(
"serve_backend_replica_starts",
"serve_deployment_replica_starts",
description=("The number of times this replica "
"has been restarted due to failure."),
tag_keys=("backend", "replica"))
tag_keys=("deployment", "replica"))
self.restart_counter.set_default_tags({
"backend": self.backend_tag,
"deployment": self.backend_tag,
"replica": self.replica_tag
})
self.processing_latency_tracker = metrics.Histogram(
"serve_backend_processing_latency_ms",
"serve_deployment_processing_latency_ms",
description="The latency for queries to be processed.",
boundaries=DEFAULT_LATENCY_BUCKET_MS,
tag_keys=("backend", "replica"))
tag_keys=("deployment", "replica"))
self.processing_latency_tracker.set_default_tags({
"backend": self.backend_tag,
"deployment": self.backend_tag,
"replica": self.replica_tag
})
self.num_processing_items = metrics.Gauge(
"serve_replica_processing_queries",
description="The current number of queries being processed.",
tag_keys=("backend", "replica"))
tag_keys=("deployment", "replica"))
self.num_processing_items.set_default_tags({
"backend": self.backend_tag,
"deployment": self.backend_tag,
"replica": self.replica_tag
})
@ -200,7 +200,7 @@ class RayServeReplica:
handler.setFormatter(
logging.Formatter(
handler.formatter._fmt +
f" component=serve backend={self.backend_tag} "
f" component=serve deployment={self.backend_tag} "
f"replica={self.replica_tag}"))
def get_runner_method(self, request_item: Query) -> Callable:

View file

@ -9,14 +9,14 @@ serve.start()
@serve.deployment
class MyBackend:
class MyDeployment:
def __init__(self):
self.my_counter = metrics.Counter(
"my_counter",
description=("The number of excellent requests to this backend."),
tag_keys=("backend", ))
tag_keys=("deployment", ))
self.my_counter.set_default_tags({
"backend": serve.get_current_backend_tag()
"deployment": serve.get_current_deployment()
})
def call(self, excellent=False):
@ -24,9 +24,9 @@ class MyBackend:
self.my_counter.inc()
MyBackend.deploy()
MyDeployment.deploy()
handle = MyBackend.get_handle()
handle = MyDeployment.get_handle()
while True:
ray.get(handle.call.remote(excellent=True))
time.sleep(1)

View file

@ -82,13 +82,13 @@ class ReplicaSet:
self.config_updated_event = asyncio.Event(loop=event_loop)
self.num_queued_queries = 0
self.num_queued_queries_gauge = metrics.Gauge(
"serve_backend_queued_queries",
"serve_deployment_queued_queries",
description=(
"The current number of queries to this backend waiting"
"The current number of queries to this deployment waiting"
" to be assigned to a replica."),
tag_keys=("backend", "endpoint"))
tag_keys=("deployment", "endpoint"))
self.num_queued_queries_gauge.set_default_tags({
"backend": self.backend_tag
"deployment": self.backend_tag
})
self.long_poll_client = LongPollClient(

View file

@ -33,19 +33,19 @@ def test_serve_metrics(serve_instance):
# counter
"num_router_requests_total",
"num_http_requests_total",
"backend_queued_queries_total",
"backend_request_counter_requests_total",
"backend_worker_starts_restarts_total",
"deployment_queued_queries_total",
"deployment_request_counter_requests_total",
"deployment_worker_starts_restarts_total",
# histogram
"backend_processing_latency_ms_bucket",
"backend_processing_latency_ms_count",
"backend_processing_latency_ms_sum",
"deployment_processing_latency_ms_bucket",
"deployment_processing_latency_ms_count",
"deployment_processing_latency_ms_sum",
# gauge
"replica_processing_queries",
# handle
"serve_handle_request_counter",
# ReplicaSet
"backend_queued_queries"
"deployment_queued_queries"
]
for metric in expected_metrics:
# For the final error round
@ -63,8 +63,8 @@ def test_serve_metrics(serve_instance):
verify_metrics()
def test_backend_logger(serve_instance):
# Tests that backend tag and replica tag appear in Serve log output.
def test_deployment_logger(serve_instance):
# Tests that deployment tag and replica tag appear in Serve log output.
logger = logging.getLogger("ray")
@serve.deployment(name="counter")
@ -83,7 +83,7 @@ def test_backend_logger(serve_instance):
def counter_log_success():
s = f.getvalue()
return "backend" in s and "replica" in s and "count" in s
return "deployment" in s and "replica" in s and "count" in s
wait_for_condition(counter_log_success)