> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zylon.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# External Grafana Dashboard

This page is for teams that use their own Grafana instance.

Zylon provides a reference dashboard for Triton and vLLM platform metrics, but importing it is a separate Grafana task. It is not required to enable observability in Zylon.

Use it when you want an external Grafana dashboard for:

* service health
* throughput and failures
* latency analysis
* scheduler and GPU bottlenecks

## What you need first

Before this dashboard is useful, you need:

* platform metrics enabled in Zylon
* a Prometheus-compatible metrics backend with Zylon metrics in it
* a Grafana instance with a Prometheus datasource connected to that backend

See [Platform Metrics](/en/operator-manual/configuration/observability/platform-metrics) and [Metrics Destinations](/en/operator-manual/configuration/observability/destinations).

## Import the dashboard

Download [grafana-dashboard.json](https://raw.githubusercontent.com/zylon-ai/zylon-docs/main/snippets/grafana-dashboard.json) and import it in your Grafana instance through **Dashboards → New → Import**.

For the Grafana import flow, see the [Grafana import dashboards documentation](https://grafana.com/docs/grafana/latest/visualizations/dashboards/build-dashboards/import-dashboards/).

## What the dashboard shows

The dashboard is built from the metrics exposed on the Triton `/metrics` endpoint:

* **Triton Inference Server** metrics such as request counts, latency, queue depth, and GPU health
* **vLLM** metrics such as scheduler state, KV cache use, token throughput, and latency histograms

## Dashboard filters

| Variable        | Purpose                                   |
| --------------- | ----------------------------------------- |
| **Datasource**  | Prometheus datasource to query            |
| **Environment** | Deployment or company identifier          |
| **Model**       | Model served by Triton                    |
| **GPU**         | `gpu_uuid` filter for GPU-specific panels |

## Reading the dashboard

Follow this order when investigating an issue:

| Section              | What it helps you answer                                    |
| -------------------- | ----------------------------------------------------------- |
| Overview             | Is the service healthy right now?                           |
| Throughput & Errors  | How much traffic is it handling, and are requests failing?  |
| Latency              | Where is time being spent?                                  |
| Capacity & Scheduler | Is the bottleneck queueing, KV cache pressure, or batching? |
| Workload Analysis    | What kind of requests are clients sending?                  |
| GPU Health           | Is the GPU saturated or memory constrained?                 |
| Host Resources       | Is the node itself under pressure?                          |

## Panels by section

### Overview

Quick health indicators for success rate, requests per second, concurrent requests, and queue depth.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/overview.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=ffa678e150cfba90250149365a22cff7" alt="Overview and Throughput & Errors sections" width="1587" height="291" data-path="images/operator-manual/observability/overview.png" />

### Throughput & Errors

Request rate, failure rate, failure reasons, batching behaviour, and queue depth over time.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/throughput-and-errors.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=719ffa5a3370212bca650fe08e285660" alt="Throughput & Errors panels" width="1587" height="335" data-path="images/operator-manual/observability/throughput-and-errors.png" />

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/throughput-and-errors-2.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=77e5c1bcd5ab65429d11b9a916544db3" alt="Failure breakdown by reason, inference count vs execution count, and pending request queue depth" width="1503" height="315" data-path="images/operator-manual/observability/throughput-and-errors-2.png" />

### Latency

End-to-end latency, phase breakdown, TTFT, TPOT, and request latency percentiles.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/latency.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=fc2f8695e420e4b7369d79389d363a24" alt="Avg end-to-end latency and latency waterfall" width="1503" height="340" data-path="images/operator-manual/observability/latency.png" />

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/latency-2.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=3c932dd61f0712a5da10b35a94a979b7" alt="Avg queue, compute, and I/O overhead; Triton and vLLM TTFT percentiles" width="1547" height="618" data-path="images/operator-manual/observability/latency-2.png" />

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/latency-3.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=bfc9cfa2639db65870f1f6a7a7dd9af2" alt="Time per output token, end-to-end latency, prefill and decode time, Triton summary quantiles" width="1509" height="558" data-path="images/operator-manual/observability/latency-3.png" />

### Capacity & Scheduler

Scheduler state, queue time, KV cache utilisation, preemptions, and batch size.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/capacity-and-scheduler.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=0d21f88c6f71d23c0a65673a61801f42" alt="Capacity & Scheduler panels" width="1548" height="669" data-path="images/operator-manual/observability/capacity-and-scheduler.png" />

### Workload Analysis

Token throughput, prompt length, generation length, and prefix-cache behaviour.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/work-analysis.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=830bbb7e486a419c6e88d93b36fa7a02" alt="Token throughput, avg tokens per request, and prompt/generation length distributions" width="1505" height="633" data-path="images/operator-manual/observability/work-analysis.png" />

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/work-analysis-2.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=c49177ca41a25f0b6313a01154d59bb8" alt="Max generation tokens, max tokens per request percentiles, and prefix cache hit rate" width="1538" height="615" data-path="images/operator-manual/observability/work-analysis-2.png" />

### GPU Health

GPU utilisation, memory pressure, power draw, and energy consumption.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/gpu-health.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=1467cc9f8ded299875e3aa594a6e60b7" alt="GPU utilization, memory, power, and energy consumption panels" width="1558" height="661" data-path="images/operator-manual/observability/gpu-health.png" />

### Host Resources

CPU, RAM, and disk availability from `node_exporter`.

<img src="https://mintcdn.com/zylon/1MC3EkpeIBooYZ9J/images/operator-manual/observability/host-resources.png?fit=max&auto=format&n=1MC3EkpeIBooYZ9J&q=85&s=23895b1bd0f315a258bbb47049008199" alt="CPU usage, RAM usage, and disk availability panels" width="1516" height="342" data-path="images/operator-manual/observability/host-resources.png" />
