We provide an interactive GPU Resource Calculator that takes your model size, precision, context length, and concurrency, and returns the VRAM breakdown plus the recommended GPU configuration.
Quick reference
- 7B – 13B: Single GPU with 24–48 GB of VRAM (RTX 5090 / L40S).
- 30B – 34B: 1–2× H100 80GB or A100 80GB.
- 70B: 8× H100 or 8× H20 141GB.
- 671B (full precision): Two nodes of 8× H200 141GB connected via InfiniBand.