Commit graph

15 commits

Author SHA1 Message Date
Dmitri Gekhtman
6d09244a7e
[Dashboard][K8s] Add toggle to enable showing node disk usage on K8s (#24416)
https://github.com/ray-project/ray/pull/14676 disabled the disk usage/total display for Ray nodes on K8s, because Ray nodes on K8s are run as pods, which in general do not use up the entire machine.

However, in some situations, it is useful to run one Ray pod per K8s node and report the disk usage.

This PR adds a flag to enable displaying disk usage in those situations.
2022-05-03 10:58:05 -05:00
Stephanie Wang
b43426bc33
[core] Add metrics for disk and network I/O (#23546)
Adds some metrics useful for object-intensive workloads:

    Per raylet/object manager:
        Add num bytes pending restore to spill manager
        Add num requests cumulative to PullManager
        Num bytes pushed/pulled from other nodes cumulative
        Histogram for request latencies in PullManager:
            total life time of request, from start to cancel
            request satisfaction time, from start to object local
            pull time, from object activation to object local
    Per-node disk read/write speed, IOPS
2022-04-01 11:15:34 -07:00
Balaji Veeramani
7f1bacc7dc
[CI] Format Python code with Black (#21975)
See #21316 and #21311 for the motivation behind these changes.
2022-01-29 18:41:57 -08:00
Edward Oakes
7736cdd91d
[dashboard] Rename "new_dashboard" -> "dashboard" (#18214) 2021-09-15 11:17:15 -05:00
Clark Zinzow
d958457d07
[Core] Second pass at privatizing APIs. (#17885)
* gcs_utils

* resource_spec

* profiling

* ray_perf and ray_cluster_perf

* test_utils
2021-08-18 20:56:33 -07:00
Richard Liaw
597dc08dfe
Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies"" (#17254)
* Revert "Revert "[core] remove opencensus/prometheus_exporter dependencies" (#17251)"

This reverts commit 7b44dd8ecb.

* Lint

* Fix more imports

Co-authored-by: Kai Fricke <kai@anyscale.com>
2021-07-26 21:09:25 -07:00
Kathryn Zhou
01dda99b8c
Export cluster statistics to Prometheus (#14612) 2021-03-15 19:28:13 -07:00
fyrestone
2da58bb021
[Dashboard] Fix reporter agent (#14378) 2021-03-08 13:12:34 -06:00
Kathryn Zhou
d6521be7ef
Export GPU metrics, CPU count, and additional Memory metrics to Prometheus (#14170) 2021-02-22 10:04:18 -08:00
Kathryn Zhou
f6b5e838fe
Add disk and network metrics to Prometheus and fix dashboard (#14144) 2021-02-17 10:27:14 -08:00
Simon Mo
33316d4f8f
Revert "Export additional metrics to Prometheus (#14061)" (#14134)
This reverts commit 82539f2da4.
2021-02-16 12:49:12 -08:00
Kathryn Zhou
82539f2da4
Export additional metrics to Prometheus (#14061) 2021-02-14 23:16:26 -08:00
SangBin Cho
32dc5676b4
[Metrics] Record per node and raylet cpu / mem usage (#12982)
* Record per node and raylet cpu / mem usage

* Add comments.

* Addressed code review.
2021-01-05 21:57:21 -08:00
SangBin Cho
753cda2f28
[Dashboard] Delete old dashboard (#12144)
* Delete old dashboard from repo.

* Delete old dashboard from repo. 2
2020-11-25 11:31:02 -08:00
Max Fitton
caf3b04b27
[Dashboard] Turn on new dashboard by default pt 2 (#11510) 2020-10-23 15:52:14 -05:00