Observability helps you answer three questions:
- is Zylon healthy?
- what is it doing?
- where should its metrics go?
Zylon observability has five parts:
- Crash reporting tells Zylon when the platform fails, so support can diagnose the problem.
- Usage metrics send anonymous product telemetry to Zylon.
- Monitoring installs the local monitoring stack inside your cluster.
- Platform metrics are the actual technical metrics from Triton, vLLM, GPUs, and nodes.
- Destinations send those metrics to your own monitoring backend.
Getting started
For most setups, think about observability in this order:
- Enable
monitoring if you want metrics at all.
- Enable
platformMetrics if you want Triton, vLLM, GPU, and node metrics.
- Add
destinations if you want to send those metrics to your own backend.
- Keep or disable
crashReporting and usageMetrics depending on whether you want Zylon telemetry.
Minimal example:
observability:
monitoring: true
platformMetrics:
enabled: true
That gives you local metrics in the in-cluster monitoring stack.
Crash reporting
observability:
crashReporting: true
observability.crashReporting controls whether Zylon sends crash diagnostics to Sentry.
Enable it if you want Zylon support to have failure information when the platform breaks. Disable it if you do not want any crash diagnostics sent to Zylon.
Usage metrics
observability:
usageMetrics: true
observability.usageMetrics controls whether Zylon sends anonymous product telemetry to Zylon-managed observability services.
This is product-level telemetry, not the detailed Triton or vLLM metrics you use for operating the cluster. Disable it if you do not want to send usage telemetry to Zylon.
Monitoring
Monitoring must be enabled if you want local metrics or external metric forwarding.
observability:
monitoring: true
observability.monitoring installs the in-cluster monitoring stack, including Prometheus, Grafana, and k8s-monitoring.
This is the base for everything else related to metrics. If monitoring is disabled, you cannot inspect platform metrics locally and you cannot forward them to your own destinations.
Platform metrics require monitoring:
observability:
monitoring: true
platformMetrics:
enabled: true
observability.platformMetrics.enabled turns on the operational metrics generated by the inference stack.
These are the metrics you use to understand request rate, failures, latency, queue depth, scheduler pressure, GPU usage, and host health. They come from Triton, vLLM, the GPU exporter, and node_exporter.
For the full metric configuration, see Platform Metrics.
External destinations
External destinations also require monitoring:
observability:
monitoring: true
k8s-monitoring:
extraDestinations:
my-prometheus:
type: prometheus
url: https://prometheus.example.com/api/v1/write
k8s-monitoring.extraDestinations forwards the metrics collected in your cluster to your own monitoring backend.
Use it only when you want to send metrics somewhere outside the built-in monitoring stack, for example to Prometheus, Grafana Cloud, or an OTLP collector.
For destination setup, see Metrics Destinations.
If your cluster restricts outbound traffic, telemetry and external destinations may require domain or endpoint allowlisting. If you disable usageMetrics, Zylon’s telemetry domains are not needed.
Next pages
For the core Zylon configuration:
If you use your own Grafana instance, the dashboard is a separate optional step: