Common Issues
Engine Fails to Start with Memory Error
Solutions:-
Verify your actual GPU memory
-
Try the next lower preset
-
Remove optional capabilities to reduce memory usage
- Check for other applications using GPU memory
- Reboot the machine
Poor Performance or Slow Responses
Solutions:- Ensure you’re using the correct preset for your hardware
- Consider decreasing to a lower-tier preset
- Communicate with Zylon engineers to understand what is happening
Pod in Failed or CrashLoopBackOff State
If the Triton or inference pods are stuck in a failed state:Advanced Issues
Issues specific to custom model configurations and multi-model setups.Startup Failures
Triton Inference Server Fails to Start
Solutions:-
Check the Triton logs to identify which specific model is causing the failure
-
Verify memory allocation for the problematic model - adjust
gpuMemoryUtilizationif needed -
If you’ve reduced memory allocation too much, reduce the
contextWindowparameter for that model -
Use
nvidia-smito check actual GPU memory usage and availability
Unsupported Model Version
Symptom: Triton fails to load a model even though the model family is supported. Cause: VLLM (the inference backend) may not support the specific version of your model yet. For example:- Mistral Small 3 (2501) is supported
- Mistral Small 3 (2509) might not be supported yet
- Check the supported model version in the documentation
- Try an earlier version of the same model family if available
- Check Zylon release notes for supported model versions
- Contact Zylon engineers to confirm model compatibility
Memory Errors
Engine Fails to Start with “Out of Memory” Error
Solutions:-
Verify total
gpuMemoryUtilizationdoes not exceed 0.95 -
Reduce allocation for one or more models based on the crash logs
-
Check actual GPU memory with
nvidia-smiduring startup