> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zylon.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Platform Metrics

Platform metrics are the operational metrics generated by the inference stack running in your cluster.

They include:

* **Triton metrics** such as request rate, failures, queue depth, and latency
* **vLLM metrics** such as scheduler state, KV cache pressure, and token throughput
* **GPU metrics** from the DCGM exporter
* **Node metrics** from `node_exporter`

## Before you enable them

Platform metrics require the monitoring stack:

```yaml theme={null}
observability:
  monitoring: true
```

## Enable platform metrics

```yaml theme={null}
observability:
  monitoring: true
  platformMetrics:
    enabled: true
    generationIntervalMs: 2000
    gpu:
      enabled: true
    inference:
      counterLatencies: true
      histogramLatencies: true
      summaryLatencies: true
      summaryQuantiles: ""
```

## Configuration options

| Flag                                           | Default | What it controls                             |
| ---------------------------------------------- | ------- | -------------------------------------------- |
| `platformMetrics.enabled`                      | `false` | Turns platform metric collection on          |
| `platformMetrics.generationIntervalMs`         | `2000`  | Triton metric generation interval            |
| `platformMetrics.gpu.enabled`                  | `true`  | Includes GPU metrics                         |
| `platformMetrics.inference.counterLatencies`   | `true`  | Enables cumulative latency counters          |
| `platformMetrics.inference.histogramLatencies` | `true`  | Enables latency histograms                   |
| `platformMetrics.inference.summaryLatencies`   | `true`  | Enables sliding-window latency summaries     |
| `platformMetrics.inference.summaryQuantiles`   | `""`    | Overrides Triton's default summary quantiles |

## What you get

### Triton

Main metric family: `nv_*`

Examples:

* `nv_inference_request_success`
* `nv_inference_request_failure`
* `nv_inference_pending_request_count`
* `nv_inference_request_duration_us`
* `nv_inference_compute_infer_duration_us`

### vLLM

Main metric families:

* `vllm_llms_v1:*`
* `vllm_embeddings_v1:*`

Examples:

* `vllm_llms_v1:num_requests_running`
* `vllm_llms_v1:kv_cache_usage_perc`
* `vllm_llms_v1:time_to_first_token_seconds_bucket`
* `vllm_llms_v1:generation_tokens_total`

### GPU

Examples:

* `nv_gpu_utilization`
* `nv_gpu_memory_used_bytes`
* `nv_gpu_power_usage`

### Node

Main metric family: `node_*`

Examples:

* `node_cpu_seconds_total`
* `node_memory_MemAvailable_bytes`
* `node_filesystem_avail_bytes`

## Next step

After platform metrics are enabled, you can either:

* inspect them in the in-cluster Grafana stack
* forward them to your own backend through [Metrics Destinations](/en/operator-manual/configuration/observability/destinations)
* use the reference [External Grafana Dashboard](/en/operator-manual/configuration/observability/external-grafana-dashboard)

For the complete upstream metric lists, see the [Triton metrics reference](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/metrics.html) and the [vLLM metrics reference](https://docs.vllm.ai/en/stable/usage/metrics/).
