- Operative System: Ubuntu 22.04 LTS or 24.04 LTS.
- CPU: Minimum of 12 core CPU amd64 architecture (x86_64), but 16 cores are recommended
- RAM: Minimum of 64GB, but 128GB are recommended
- Storage: Unbounded, minimum of 4TB of fast storage, but 8TB are recommended
- GPU:
- Bare metal
- Server Hardware (external cooling required ⚠️)
- NVIDIA L40/L40s (48 GB)
- NVIDIA H100 (80 / 96 GB)
- NVIDIA A100 (40 / 80 GB)
- NVIDIA H200 (141 GB)
- Desktop Hardware (embedded cooling system)
- NVIDIA GeForce RTX 5090 (32 GB)
- Server Hardware (external cooling required ⚠️)
- AWS
- Azure Cloud
- Standard_NV36ads_A10_v5 (recommended) https://learn.microsoft.com/es-es/azure/virtual-machines/sizes/gpu-accelerated/nvadsa10v5-series?tabs=sizebasic
- NVv3 (Requires quantized models) https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nvv3-series?tabs=sizebasic
- Bare metal
What GPU should I buy?
Finding the right GPU for your system can be a tricky process. For example, two GPUs with the same vRAM might not perform the same:- L4 (server) averages: minimum of 15 tk/s, peak at 89 tk/s
- RTX 4090 (desktop) averages: minimum 6.9 tk/s, peak 45tk/s
Nvidia L40s | Azure A10 | RTX 5090 | A100 / H100 | |
---|---|---|---|---|
Requires Workstation | ✅ | ❌ | ❌ | ✅ |
LLM | ✅ | ✅ | ✅ | ✅ |
Reranker* | ✅ | ❌ | ✅ | ✅ |
Multi-model (images)* | ✅ | ❌ | ✅ | ✅ |
GPU Model | vRAM (GB) | Price Price (USD) |
---|---|---|
NVIDIA L40 48GB/L40s | 48 | 9000 |
NVIDIA H100 (PCIe) | 80 | 30,000 |
NVIDIA H100 (SXM) | 80 | 40,000 |
NVIDIA H100 (NVL) | 96 | 45,000 |
NVIDIA A100 (PCIe) | 40 | 10,000 |
NVIDIA A100 (PCIe) | 80 | 20,000 |
NVIDIA A100 (SXM) | 40 | 12,000 |
NVIDIA A100 (SXM) | 80 | 25,000 |
NVIDIA H200 | 141 | 32,000 |
NVIDIA GeForce RTX 5090 | 32 | 3,500 |
Reference hardware for mid-size organization
If you need to acquired your AI-capable equipment from scratch, as of July 29, 2025 please consider the following hardware recommendation: