2021-10-21 13:47:55 -07:00
.. _ray-metrics:
2021-03-15 17:37:02 -05:00
Exporting Metrics
=================
2020-08-06 16:16:29 -07:00
2022-08-10 18:06:28 -04:00
To help monitor Ray applications, Ray
- Collects some default system level metrics.
2020-08-06 16:16:29 -07:00
- Exposes metrics in a Prometheus format. We'll call the endpoint to access these metrics a Prometheus endpoint.
2022-08-10 18:06:28 -04:00
- Supports custom metrics APIs that resemble Prometheus `metric types <https://prometheus.io/docs/concepts/metric_types/> `_ .
2020-08-06 16:16:29 -07:00
2021-03-19 13:54:52 -07:00
This page describes how to access these metrics using Prometheus.
2020-08-06 16:16:29 -07:00
.. note ::
It is currently an experimental feature and under active development. APIs are subject to change.
Getting Started (Single Node)
-----------------------------
2021-03-19 13:54:52 -07:00
First, install Ray with the proper dependencies:
.. code-block :: bash
2021-10-29 10:54:38 -07:00
pip install "ray[default]"
2021-03-19 13:54:52 -07:00
2020-08-06 16:16:29 -07:00
Ray exposes its metrics in Prometheus format. This allows us to easily scrape them using Prometheus.
Let's expose metrics through `ray start` .
.. code-block :: bash
ray start --head --metrics-export-port=8080 # Assign metrics export port on a head node.
2021-01-12 20:35:38 -08:00
Now, you can scrape Ray's metrics using Prometheus.
2020-08-06 16:16:29 -07:00
First, download Prometheus. `Download Link <https://prometheus.io/download/> `_
.. code-block :: bash
tar xvfz prometheus-*.tar.gz
cd prometheus-*
Let's modify Prometheus's config file to scrape metrics from Prometheus endpoints.
.. code-block :: yaml
# prometheus.yml
global:
2020-11-07 02:34:33 +08:00
scrape_interval: 5s
evaluation_interval: 5s
2020-08-06 16:16:29 -07:00
scrape_configs:
2020-11-07 02:34:33 +08:00
- job_name: prometheus
2020-08-06 16:16:29 -07:00
static_configs:
- targets: ['localhost:8080'] # This must be same as metrics_export_port
Next, let's start Prometheus.
.. code-block :: shell
./prometheus --config.file=./prometheus.yml
Now, you can access Ray metrics from the default Prometheus url, `http://localhost:9090` .
2022-08-10 18:06:28 -04:00
See :ref: `here <multi-node-metrics>` for more information on how to set up Prometheus on a Ray Cluster.
2020-09-25 09:10:28 -07:00
2021-12-01 15:39:22 -08:00
.. _application-level-metrics:
2021-03-15 17:37:02 -05:00
Application-level Metrics
-------------------------
Ray provides a convenient API in :ref: `ray.util.metrics <custom-metric-api-ref>` for defining and exporting custom metrics for visibility into your applications.
There are currently three metrics supported: Counter, Gauge, and Histogram.
These metrics correspond to the same `Prometheus metric types <https://prometheus.io/docs/concepts/metric_types/> `_ .
Below is a simple example of an actor that exports metrics using these APIs:
2022-08-02 20:51:47 -07:00
.. literalinclude :: doc_code/metrics_example.py
2021-03-15 17:37:02 -05:00
:language: python
While the script is running, the metrics will be exported to `` localhost:8080 `` (this is the endpoint that Prometheus would be configured to scrape).
If you open this in the browser, you should see the following output:
.. code-block :: none
# HELP ray_request_latency Latencies of requests in ms.
# TYPE ray_request_latency histogram
2022-05-17 19:31:05 -07:00
ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="0.1"} 2.0
ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="1.0"} 2.0
ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="+Inf"} 2.0
ray_request_latency_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0
ray_request_latency_sum{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 0.11992454528808594
2021-03-15 17:37:02 -05:00
# HELP ray_curr_count Current count held by the actor. Goes up and down.
# TYPE ray_curr_count gauge
2022-05-17 19:31:05 -07:00
ray_curr_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} -15.0
2021-03-15 17:37:02 -05:00
# HELP ray_num_requests_total Number of requests processed by the actor.
# TYPE ray_num_requests_total counter
2022-05-17 19:31:05 -07:00
ray_num_requests_total{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0
2021-03-15 17:37:02 -05:00
Please see :ref: `ray.util.metrics <custom-metric-api-ref>` for more details.