Troubleshooting

Common Issues

Engine Fails to Start with Memory Error

Solutions:

Verify your actual GPU memory
```
nvidia-smi
```

Try the next lower preset

# If using baseline-32g, try baseline-24g instead
ai:
  preset: "baseline-24g"

Remove optional capabilities to reduce memory usage

# Remove capabilities
ai:
  preset: "baseline-24g"  # Instead of "baseline-24g,capabilities.multilingual"

Check for other applications using GPU memory
Reboot the machine

Poor Performance or Slow Responses

Solutions:

Ensure you’re using the correct preset for your hardware
Consider decreasing to a lower-tier preset
Communicate with Zylon engineers to understand what is happening

Pod in Failed or CrashLoopBackOff State

If the Triton or inference pods are stuck in a failed state:

# Restart the deployment
kubectl rollout restart deploy/zylon-triton -n zylon

This forces Kubernetes to recreate the pods with a fresh state.

Advanced Issues

Issues specific to custom model configurations and multi-model setups.

Startup Failures

Triton Inference Server Fails to Start

Solutions:

Check the Triton logs to identify which specific model is causing the failure
```
kubectl logs deploy/zylon-triton -n zylon --tail=200
```
Verify memory allocation for the problematic model - adjust gpuMemoryUtilization if needed
If you’ve reduced memory allocation too much, reduce the contextWindow parameter for that model
Use nvidia-smi to check actual GPU memory usage and availability
```
nvidia-smi
```

Unsupported Model Version

Symptom: Triton fails to load a model even though the model family is supported. Cause: VLLM (the inference backend) may not support the specific version of your model yet. For example:

Mistral Small 3 (2501) is supported
Mistral Small 3 (2509) might not be supported yet

Solutions:

Check the supported model version in the documentation
Try an earlier version of the same model family if available
Check Zylon release notes for supported model versions
Contact Zylon engineers to confirm model compatibility

Memory Errors

Engine Fails to Start with “Out of Memory” Error

Solutions:

Verify total gpuMemoryUtilization does not exceed 0.95

# Calculate total across all models
ai:
  config:
    models:
      - id: llm
        gpuMemoryUtilization: 0.60
      - id: llmvision
        gpuMemoryUtilization: 0.25
      - id: embed
        gpuMemoryUtilization: 0.10
# Total: 0.95 ✓

Reduce allocation for one or more models based on the crash logs
```
kubectl logs deploy/zylon-triton -n zylon
```
Check actual GPU memory with nvidia-smi during startup
```
watch -n 1 nvidia-smi
```

Getting Started

Installation

Configuration

Maintenance & Operations

Troubleshooting

Troubleshooting

Common Issues

Engine Fails to Start with Memory Error

Poor Performance or Slow Responses

Pod in Failed or CrashLoopBackOff State

Advanced Issues

Startup Failures

Triton Inference Server Fails to Start

Unsupported Model Version

Memory Errors

Engine Fails to Start with “Out of Memory” Error

Getting Started

Installation

Configuration

Maintenance & Operations

Troubleshooting

​Common Issues

​Engine Fails to Start with Memory Error

​Poor Performance or Slow Responses

​Pod in Failed or CrashLoopBackOff State

​Advanced Issues

​Startup Failures

​Triton Inference Server Fails to Start

​Unsupported Model Version

​Memory Errors

​Engine Fails to Start with “Out of Memory” Error

Common Issues

Engine Fails to Start with Memory Error

Poor Performance or Slow Responses

Pod in Failed or CrashLoopBackOff State

Advanced Issues

Startup Failures

Triton Inference Server Fails to Start

Unsupported Model Version

Memory Errors

Engine Fails to Start with “Out of Memory” Error