Configuration Options

Enhanced Capabilities

Zylon supports additional capabilities that can be combined with any base or alternative preset. These capabilities extend the functionality but are not enabled by default.

Available Capabilities

Capability	Description	Example Use Cases
`multilingual`	Enhanced support for languages beyond English	International documents, non-English content processing

Adding Capabilities

Capabilities are added to presets using a comma-separated format: <base_preset>,<capability1>,<capability2> Examples:

# Base preset with multilingual capability
ai:
  preset: "baseline-24g,capabilities.multilingual"

# Alternative preset with multilingual capability
ai:
  preset: "alternatives.baseline-48g-context,capabilities.multilingual"

# Multiple capabilities (if more become available)
ai:
  preset: "baseline-48g,capabilities.multilingual,capabilities.feature2"

Capabilities can be stacked with any preset type including base, alternative, and experimental presets.

Multi-GPU Configuration

If your system has multiple GPUs, you can combine their memory capacity to use higher-tier presets. Select the preset based on total combined VRAM across all GPUs.

Configuration Steps

Calculate total VRAM: Add up the memory of all GPUs
Select appropriate preset: Choose preset for the total memory
Configure GPU count: Set the numGPUs parameter

Configuration Example:

ai:
  preset: "baseline-48g"
  numGPUs: 2  # Using 2 GPUs with 24GB each (48GB total)

Multi-GPU Configuration Examples

Hardware Setup	Individual GPU Memory	Total VRAM	Recommended Preset	Configuration
2x RTX 4090	24GB each	48GB	`baseline-48g`	`numGPUs: 2`
2x L4	24GB each	48GB	`baseline-48g`	`numGPUs: 2`
4x RTX 4090	24GB each	96GB	`baseline-96g`	`numGPUs: 4`
2x RTX A6000	48GB each	96GB	`baseline-96g`	`numGPUs: 2`

Multi-GPU Best Practices

Ensure all GPUs are the same model for optimal performance
Verify adequate PCIe bandwidth between GPUs
Monitor GPU utilization to ensure balanced load
Consider NVLink connections for better inter-GPU communication when available

Complete Multi-GPU Example:

ai:
  preset: "baseline-96g,capabilities.multilingual"
  numGPUs: 4  # 4x RTX 4090 (24GB each = 96GB total)

Shared Memory Configuration

The Triton Inference Server uses shared memory to enable zero-copy data transfer between Zylon services and the inference engine. This eliminates serialization overhead and significantly improves inference throughput and reduces latency for high-volume workloads.

Default Allocation

By default, the inference server allocates 2GB of RAM for shared memory. This is sufficient for most text-based inference workloads.

When to Increase Shared Memory

You may encounter Shared memory allocation failed errors in these scenarios:

Large request queues: Processing high volumes of concurrent requests where queued input exceeds available shared memory
Image-based models: Vision workloads requiring multiple megabytes per image where batches of high-resolution images quickly exhaust the default allocation
Large document processing: Handling very large documents or multiple documents simultaneously

Configuration

To increase the shared memory limit, update your Zylon configuration file:

triton:
  sharedMemory:
    limit: "4Gi"  # Increase from default 2Gi

Recommended Shared Memory by Use Case

Use Case	Recommended Limit	Reason
Text-only inference	2Gi (default)	Sufficient for most text workloads
Low-volume vision tasks	4Gi	Handles occasional image processing
High-volume vision tasks	8Gi	Supports batch image processing
Mixed heavy workloads	8-16Gi	Accommodates concurrent text and vision

Important considerations when increasing shared memory:

Allocating excessive shared memory can cause pods to be OOMKilled
Be conservative with increases—start with small increments (e.g., 2Gi → 4Gi)
Monitor actual usage with Kubernetes metrics before further increases
Shared memory is reserved from system RAM, reducing available memory for other processes

Getting Started

Installation

Configuration

Maintenance & Operations

Troubleshooting

Configuration Options

Enhanced Capabilities

Available Capabilities

Adding Capabilities

Multi-GPU Configuration

Configuration Steps

Multi-GPU Configuration Examples

Multi-GPU Best Practices

Shared Memory Configuration

Default Allocation

When to Increase Shared Memory

Configuration

Recommended Shared Memory by Use Case

Getting Started

Installation

Configuration

Maintenance & Operations

Troubleshooting

​Enhanced Capabilities

​Available Capabilities

​Adding Capabilities

​Multi-GPU Configuration

​Configuration Steps

​Multi-GPU Configuration Examples

​Multi-GPU Best Practices

​Shared Memory Configuration

​Default Allocation

​When to Increase Shared Memory

​Configuration

​Recommended Shared Memory by Use Case

Enhanced Capabilities

Available Capabilities

Adding Capabilities

Multi-GPU Configuration

Configuration Steps

Multi-GPU Configuration Examples

Multi-GPU Best Practices

Shared Memory Configuration

Default Allocation

When to Increase Shared Memory

Configuration

Recommended Shared Memory by Use Case