ray/doc/source/ray-observability/ray-metrics.rst

.. _ray-metrics:

Exporting Metrics
=================

To help monitor Ray applications, Ray

- Collects some default system level metrics.
- Exposes metrics in a Prometheus format. We'll call the endpoint to access these metrics a Prometheus endpoint.
- Supports custom metrics APIs that resemble Prometheus `metric types <https://prometheus.io/docs/concepts/metric_types/>`_.

This page describes how to access these metrics using Prometheus.

.. note::

    It is currently an experimental feature and under active development. APIs are subject to change.

Getting Started (Single Node)
-----------------------------

First, install Ray with the proper dependencies:

.. code-block:: bash

  pip install "ray[default]"

Ray exposes its metrics in Prometheus format. This allows us to easily scrape them using Prometheus.

Let's expose metrics through `ray start`.

.. code-block:: bash

    ray start --head --metrics-export-port=8080 # Assign metrics export port on a head node.

Now, you can scrape Ray's metrics using Prometheus.

First, download Prometheus. `Download Link <https://prometheus.io/download/>`_

.. code-block:: bash

    tar xvfz prometheus-*.tar.gz
    cd prometheus-*

Let's modify Prometheus's config file to scrape metrics from Prometheus endpoints.

.. code-block:: yaml

    # prometheus.yml
    global:
      scrape_interval:     5s
      evaluation_interval: 5s

    scrape_configs:
      - job_name: prometheus
        static_configs:
        - targets: ['localhost:8080'] # This must be same as metrics_export_port

Next, let's start Prometheus.

.. code-block:: shell

    ./prometheus --config.file=./prometheus.yml

Now, you can access Ray metrics from the default Prometheus url, `http://localhost:9090`.

See :ref:`here <multi-node-metrics>` for more information on how to set up Prometheus on a Ray Cluster.

.. _application-level-metrics:

Application-level Metrics
-------------------------
Ray provides a convenient API in :ref:`ray.util.metrics <custom-metric-api-ref>` for defining and exporting custom metrics for visibility into your applications.
There are currently three metrics supported: Counter, Gauge, and Histogram.
These metrics correspond to the same `Prometheus metric types <https://prometheus.io/docs/concepts/metric_types/>`_.
Below is a simple example of an actor that exports metrics using these APIs:

.. literalinclude:: doc_code/metrics_example.py
   :language: python

While the script is running, the metrics will be exported to ``localhost:8080`` (this is the endpoint that Prometheus would be configured to scrape).
If you open this in the browser, you should see the following output:

.. code-block:: none

  # HELP ray_request_latency Latencies of requests in ms.
  # TYPE ray_request_latency histogram
  ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="0.1"} 2.0
  ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="1.0"} 2.0
  ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="+Inf"} 2.0
  ray_request_latency_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0
  ray_request_latency_sum{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 0.11992454528808594
  # HELP ray_curr_count Current count held by the actor. Goes up and down.
  # TYPE ray_curr_count gauge
  ray_curr_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} -15.0
  # HELP ray_num_requests_total Number of requests processed by the actor.
  # TYPE ray_num_requests_total counter
  ray_num_requests_total{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0

Please see :ref:`ray.util.metrics <custom-metric-api-ref>` for more details.
[Doc] [Serve] Fix code cutoff and broken linkes in deployment.rst (#19573) 2021-10-21 13:47:55 -07:00			`.. _ray-metrics:`

[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00			`Exporting Metrics`
			`=================`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00
[docs] Revamp docs on observability for ray cluster apps (#27724) Signed-off-by: Stephanie Wang swang@cs.berkeley.edu Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections: Add links to the more comprehensive observability section. Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was. Include more specific instructions about setting up observability tools for VMs vs k8s. 2022-08-10 18:06:28 -04:00			`To help monitor Ray applications, Ray`

			`- Collects some default system level metrics.`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00			`- Exposes metrics in a Prometheus format. We'll call the endpoint to access these metrics a Prometheus endpoint.`
[docs] Revamp docs on observability for ray cluster apps (#27724) Signed-off-by: Stephanie Wang swang@cs.berkeley.edu Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections: Add links to the more comprehensive observability section. Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was. Include more specific instructions about setting up observability tools for VMs vs k8s. 2022-08-10 18:06:28 -04:00			- Supports custom metrics APIs that resemble Prometheus `metric types <https://prometheus.io/docs/concepts/metric_types/>`_.
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00
Set up things to remove dependencies in later release (#14793) Signed-off-by: Richard Liaw <rliaw@berkeley.edu> 2021-03-19 13:54:52 -07:00			`This page describes how to access these metrics using Prometheus.`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00
			`.. note::`

			`It is currently an experimental feature and under active development. APIs are subject to change.`

			`Getting Started (Single Node)`
			`-----------------------------`
Set up things to remove dependencies in later release (#14793) Signed-off-by: Richard Liaw <rliaw@berkeley.edu> 2021-03-19 13:54:52 -07:00
			`First, install Ray with the proper dependencies:`

			`.. code-block:: bash`

[Documentation] Fix quotes for windows installations (#19859) * [Documentation] Fix quotes for windows installations * update * formatting 2021-10-29 10:54:38 -07:00			`pip install "ray[default]"`
Set up things to remove dependencies in later release (#14793) Signed-off-by: Richard Liaw <rliaw@berkeley.edu> 2021-03-19 13:54:52 -07:00
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00			`Ray exposes its metrics in Prometheus format. This allows us to easily scrape them using Prometheus.`

			Let's expose metrics through `ray start`.

			`.. code-block:: bash`

			`ray start --head --metrics-export-port=8080 # Assign metrics export port on a head node.`

[Doc] Remove trailing whitespaces (#13390) 2021-01-12 20:35:38 -08:00			`Now, you can scrape Ray's metrics using Prometheus.`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00
			First, download Prometheus. `Download Link <https://prometheus.io/download/>`_

			`.. code-block:: bash`

			`tar xvfz prometheus-*.tar.gz`
			`cd prometheus-*`

			`Let's modify Prometheus's config file to scrape metrics from Prometheus endpoints.`

			`.. code-block:: yaml`

			`# prometheus.yml`
			`global:`
[Metrics] Fix prometheus configuration doc (#11856) 2020-11-07 02:34:33 +08:00			`scrape_interval: 5s`
			`evaluation_interval: 5s`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00
			`scrape_configs:`
[Metrics] Fix prometheus configuration doc (#11856) 2020-11-07 02:34:33 +08:00			`- job_name: prometheus`
[Stats] Metrics Export User Interface Part 1 (#9913) * Metrics export port expose done. * Support exposing metrics port + metrics agent service discovery through ray.nodes() * Formatting. * Added a doc. * Linting. * Change the location of metrics agent port. * Addressed code review. * Addressed code review. 2020-08-06 16:16:29 -07:00			`static_configs:`
			`- targets: ['localhost:8080'] # This must be same as metrics_export_port`

			`Next, let's start Prometheus.`

			`.. code-block:: shell`

			`./prometheus --config.file=./prometheus.yml`

			Now, you can access Ray metrics from the default Prometheus url, `http://localhost:9090`.

[docs] Revamp docs on observability for ray cluster apps (#27724) Signed-off-by: Stephanie Wang swang@cs.berkeley.edu Various cleanups around docs on Ray cluster "Monitoring and observability". After #27723, we will move these to a common page outside of VMs/k8s subsections: Add links to the more comprehensive observability section. Move and clean up cluster-specific content from Prometheus metrics to the new Ray Cluster page. I also modified a bunch of text here because previously we were not very clear about what the recommended approach was. Include more specific instructions about setting up observability tools for VMs vs k8s. 2022-08-10 18:06:28 -04:00			See :ref:`here <multi-node-metrics>` for more information on how to set up Prometheus on a Ray Cluster.
[Metric] custom metrics refinement (#10861) * In progress * In Progress. * Addressed code review. * Add unit tests. * Add a simple doc. * Fixed test failure. * Fix all test failures from serve. * Addressed code review. 2020-09-25 09:10:28 -07:00
[Serve] [Doc] fix custom metric link in serve doc (#20775) 2021-12-01 15:39:22 -08:00			`.. _application-level-metrics:`

[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00			`Application-level Metrics`
			`-------------------------`
			Ray provides a convenient API in :ref:`ray.util.metrics <custom-metric-api-ref>` for defining and exporting custom metrics for visibility into your applications.
			`There are currently three metrics supported: Counter, Gauge, and Histogram.`
			These metrics correspond to the same `Prometheus metric types <https://prometheus.io/docs/concepts/metric_types/>`_.
			`Below is a simple example of an actor that exports metrics using these APIs:`

[Doc] Test ray core doc code (#27334) - Currently not all code under ray-core/doc_code is covered by CI. - tf_example.py and torch_example.py are not used anywhere. Signed-off-by: Jiajun Yao <jeromeyjj@gmail.com> 2022-08-02 20:51:47 -07:00			`.. literalinclude:: doc_code/metrics_example.py`
[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00			`:language: python`

			While the script is running, the metrics will be exported to ``localhost:8080`` (this is the endpoint that Prometheus would be configured to scrape).
			`If you open this in the browser, you should see the following output:`

			`.. code-block:: none`

			`# HELP ray_request_latency Latencies of requests in ms.`
			`# TYPE ray_request_latency histogram`
Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894) 2022-05-17 19:31:05 -07:00			`ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="0.1"} 2.0`
			`ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="1.0"} 2.0`
			`ray_request_latency_bucket{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor",le="+Inf"} 2.0`
			`ray_request_latency_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0`
			`ray_request_latency_sum{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 0.11992454528808594`
[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00			`# HELP ray_curr_count Current count held by the actor. Goes up and down.`
			`# TYPE ray_curr_count gauge`
Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894) 2022-05-17 19:31:05 -07:00			`ray_curr_count{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} -15.0`
[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00			`# HELP ray_num_requests_total Number of requests processed by the actor.`
			`# TYPE ray_num_requests_total counter`
Bump Ray Version from 2.0.0.dev0 to 3.0.0.dev0 (#24894) 2022-05-17 19:31:05 -07:00			`ray_num_requests_total{Component="core_worker",Version="3.0.0.dev0",actor_name="my_actor"} 2.0`
[metrics] Improve custom metrics docs, add an example on how to use them (#14690) 2021-03-15 17:37:02 -05:00
			Please see :ref:`ray.util.metrics <custom-metric-api-ref>` for more details.