GenAI Sizing Calculator
Calculate the optimal GPU configuration for your GenAI workloads. Input your requirements and get recommendations for hardware sizing, expected throughput, and cost estimates.
Model Configuration
Used for memory footprint calculation
For future optimization features
Memory Requirements
Workload Requirements
GPU Selection
Real-world efficiency modifier for throughput estimates
Performance Specifications
Sizing Results
Select a model to see GPU requirements.
Key Assumptions
Memory-First Sizing: GPU count determined by total memory requirements (model weights + KV cache), not compute requirements.
KV Cache Calculation: Uses actual model architecture details (layers, hidden size, attention heads) with support for GQA and MoE models.
Memory-Bandwidth Bound: Throughput estimates assume memory bandwidth is the bottleneck, typical for autoregressive inference.
Real-World Efficiency: Throughput includes configurable efficiency modifier (default 70%) to account for batching and scheduling overhead.
Model Parameters: Extracted directly from HuggingFace model configs when available, with manual input fallback.
Precision Support: Supports 4-bit through 32-bit precision for both model weights and KV cache independently.
Single Model Instance: Memory and performance estimates are for one model instance only. Multi-model serving not yet supported.
Roadmap
Launch v1.0!
Multiple model instance support
Add additional GPU options
Add Time per output token
Differentiate between prefill and decode timing