> ## Documentation Index
> Fetch the complete documentation index at: https://docs.zylon.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# How to configure observability

Observability helps you answer three questions:

* is Zylon healthy?
* what is it doing?
* where should its metrics go?

Zylon observability has five parts:

* **Crash reporting** tells Zylon when the platform fails, so support can diagnose the problem.
* **Usage metrics** send anonymous product telemetry to Zylon.
* **Monitoring** installs the local monitoring stack inside your cluster.
* **Platform metrics** are the actual technical metrics from Triton, vLLM, GPUs, and nodes.
* **Destinations** send those metrics to your own monitoring backend.

## Getting started

For most setups, think about observability in this order:

1. Enable `monitoring` if you want metrics at all.
2. Enable `platformMetrics` if you want Triton, vLLM, GPU, and node metrics.
3. Add `destinations` if you want to send those metrics to your own backend.
4. Keep or disable `crashReporting` and `usageMetrics` depending on whether you want Zylon telemetry.

Minimal example:

```yaml theme={null}
observability:
  monitoring: true
  platformMetrics:
    enabled: true
```

That gives you local metrics in the in-cluster monitoring stack.

### Crash reporting

```yaml theme={null}
observability:
  crashReporting: true
```

`observability.crashReporting` controls whether Zylon sends crash diagnostics to Sentry.

Enable it if you want Zylon support to have failure information when the platform breaks. Disable it if you do not want any crash diagnostics sent to Zylon.

### Usage metrics

```yaml theme={null}
observability:
  usageMetrics: true
```

`observability.usageMetrics` controls whether Zylon sends anonymous product telemetry to Zylon-managed observability services.

This is product-level telemetry, not the detailed Triton or vLLM metrics you use for operating the cluster. Disable it if you do not want to send usage telemetry to Zylon.

### Monitoring

Monitoring must be enabled if you want local metrics or external metric forwarding.

```yaml theme={null}
observability:
  monitoring: true
```

`observability.monitoring` installs the in-cluster monitoring stack, including Prometheus, Grafana, and `k8s-monitoring`.

This is the base for everything else related to metrics. If monitoring is disabled, you cannot inspect platform metrics locally and you cannot forward them to your own destinations.

### Platform metrics

Platform metrics require monitoring:

```yaml theme={null}
observability:
  monitoring: true
  platformMetrics:
    enabled: true
```

`observability.platformMetrics.enabled` turns on the operational metrics generated by the inference stack.

These are the metrics you use to understand request rate, failures, latency, queue depth, scheduler pressure, GPU usage, and host health. They come from Triton, vLLM, the GPU exporter, and `node_exporter`.

For the full metric configuration, see [Platform Metrics](/en/operator-manual/configuration/observability/platform-metrics).

### External destinations

External destinations also require monitoring:

```yaml theme={null}
observability:
  monitoring: true

k8s-monitoring:
  extraDestinations:
    my-prometheus:
      type: prometheus
      url: https://prometheus.example.com/api/v1/write
```

`k8s-monitoring.extraDestinations` forwards the metrics collected in your cluster to your own monitoring backend.

Use it only when you want to send metrics somewhere outside the built-in monitoring stack, for example to Prometheus, Grafana Cloud, or an OTLP collector.

For destination setup, see [Metrics Destinations](/en/operator-manual/configuration/observability/destinations).

<Warning>
  If your cluster restricts outbound traffic, telemetry and external destinations may require domain or endpoint allowlisting. If you disable `usageMetrics`, Zylon's telemetry domains are not needed.
</Warning>

## Next pages

For the core Zylon configuration:

* [Platform Metrics](/en/operator-manual/configuration/observability/platform-metrics): enable Triton, vLLM, GPU, and node metrics
* [Metrics Destinations](/en/operator-manual/configuration/observability/destinations): send metrics to Prometheus, Grafana-compatible backends, or OTLP

If you use your own Grafana instance, the dashboard is a separate optional step:

* [External Grafana Dashboard](/en/operator-manual/configuration/observability/external-grafana-dashboard): import the reference dashboard into your own Grafana instance
