Zylon only has one strict requirement regarding hardware: it must have access to a GPU with NVIDIA CUDA capabilities. Depending on the hardware, some AI models might be restricted, so to ensure compatibility aim for newest compatible CUDA versions (12.6+). Our recommended specifications for the best experience are:

What GPU should I buy?

Finding the right GPU for your system can be a tricky process. For example, two GPUs with the same vRAM might not perform the same:
  • L4 (server) averages: minimum of 15 tk/s, peak at 89 tk/s
  • RTX 4090 (desktop) averages: minimum 6.9 tk/s, peak 45tk/s
On the other hand, depending on the GPU you chose, other features that might impact AI quality will be enabled, if any of them is relevant for your uses cases factor that in for your decision:
Nvidia L40sAzure A10RTX 5090A100 / H100
Requires Workstation
LLM
Reranker*
Multi-model (images)*
*Currently in development/under testing — subject to change in the future. For on-premise bare metal environments (the usual scenario for Zylon clients), an important factor would be your ability to properly cool the GPU installed in your machine. If you don’t want to take care of it or lack the experience, go for a desktop hardware option. But keep in mind that in case you want to run bigger models or provide service to several hundreds of users, you might need to install a rack with a couple in parallel or be forced to move to server hardware models. Another important factor would be the investment, specially regarding the GPU. The price ranges May 5, 2025 for the aforementioned models are:
GPU ModelvRAM (GB)Price Price (USD)
NVIDIA L40 48GB/L40s487,9007,900 – 9000
NVIDIA H100 (PCIe)8025,00025,000 – 30,000
NVIDIA H100 (SXM)8035,00035,000 – 40,000
NVIDIA H100 (NVL)9640,00040,000 - 45,000
NVIDIA A100 (PCIe)408,0008,000 - 10,000
NVIDIA A100 (PCIe)8018,00018,000 - 20,000
NVIDIA A100 (SXM)4010,00010,000 - 12,000
NVIDIA A100 (SXM)8020,00020,000 - 25,000
NVIDIA H20014130,00030,000 - 32,000
NVIDIA GeForce RTX 5090323,0003,000 - 3,500
In any case, as a direct answer to the question of which GPU should you buy, keep in mind that as of today we have several clients running Zylon on RTX 5090s supporting 200+ users in their organizations with great performance.

Reference hardware for mid-size organization

If you need to acquired your AI-capable equipment from scratch, as of July 29, 2025 please consider the following hardware recommendation: image.png This configuration includes an RTX GeForce NVIDIA 5090 (32 GB), a powerful GPU (16 cores), 128 GB of RAM and enough storage capacity to operate Zylon with margin to grow. It also provides a robust cooling solution to ensure optimal performance under heavy workloads, as well as a big motherboard to fit two GPUs at some point if needed in the future. Keep in mind that this is just a recommendation, so feel free to adapt it to your preferences while keeping similar capabilities for ideal performance. We have used Amazon as a provider considering that you can assemble all the parts together by yourself, but any provider that you usually work with should be able to get a similar hardware and assemble it for you.

Reference hardware for big-size organization

In these scenarios, we don’t provide a reference hardware configuration until we understand the requirements not only regarding number of users, but also what kind of internal operations will be run in parallel by leveraging the platform API. In you are in this situation, we are likely already discussing about this.