Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.zylon.ai/llms.txt

Use this file to discover all available pages before exploring further.

Observability helps you answer three questions:
  • is Zylon healthy?
  • what is it doing?
  • where should its metrics go?
Zylon observability has five parts:
  • Crash reporting tells Zylon when the platform fails, so support can diagnose the problem.
  • Usage metrics send anonymous product telemetry to Zylon.
  • Monitoring installs the local monitoring stack inside your cluster.
  • Platform metrics are the actual technical metrics from Triton, vLLM, GPUs, and nodes.
  • Destinations send those metrics to your own monitoring backend.

Getting started

For most setups, think about observability in this order:
  1. Enable monitoring if you want metrics at all.
  2. Enable platformMetrics if you want Triton, vLLM, GPU, and node metrics.
  3. Add destinations if you want to send those metrics to your own backend.
  4. Keep or disable crashReporting and usageMetrics depending on whether you want Zylon telemetry.
Minimal example:
observability:
  monitoring: true
  platformMetrics:
    enabled: true
That gives you local metrics in the in-cluster monitoring stack.

Crash reporting

observability:
  crashReporting: true
observability.crashReporting controls whether Zylon sends crash diagnostics to Sentry. Enable it if you want Zylon support to have failure information when the platform breaks. Disable it if you do not want any crash diagnostics sent to Zylon.

Usage metrics

observability:
  usageMetrics: true
observability.usageMetrics controls whether Zylon sends anonymous product telemetry to Zylon-managed observability services. This is product-level telemetry, not the detailed Triton or vLLM metrics you use for operating the cluster. Disable it if you do not want to send usage telemetry to Zylon.

Monitoring

Monitoring must be enabled if you want local metrics or external metric forwarding.
observability:
  monitoring: true
observability.monitoring installs the in-cluster monitoring stack, including Prometheus, Grafana, and k8s-monitoring. This is the base for everything else related to metrics. If monitoring is disabled, you cannot inspect platform metrics locally and you cannot forward them to your own destinations.

Platform metrics

Platform metrics require monitoring:
observability:
  monitoring: true
  platformMetrics:
    enabled: true
observability.platformMetrics.enabled turns on the operational metrics generated by the inference stack. These are the metrics you use to understand request rate, failures, latency, queue depth, scheduler pressure, GPU usage, and host health. They come from Triton, vLLM, the GPU exporter, and node_exporter. For the full metric configuration, see Platform Metrics.

External destinations

External destinations also require monitoring:
observability:
  monitoring: true

k8s-monitoring:
  extraDestinations:
    my-prometheus:
      type: prometheus
      url: https://prometheus.example.com/api/v1/write
k8s-monitoring.extraDestinations forwards the metrics collected in your cluster to your own monitoring backend. Use it only when you want to send metrics somewhere outside the built-in monitoring stack, for example to Prometheus, Grafana Cloud, or an OTLP collector. For destination setup, see Metrics Destinations.
If your cluster restricts outbound traffic, telemetry and external destinations may require domain or endpoint allowlisting. If you disable usageMetrics, Zylon’s telemetry domains are not needed.

Next pages

For the core Zylon configuration: If you use your own Grafana instance, the dashboard is a separate optional step: