The honest comparison tool. Explore quantization trade-offs, check if your hardware can run a model, and see real-world benchmarks — all in one place.
| Quant | File Size | RAM Needed | Quality | CPU tok/s | GPU tok/s |
|---|
| Model | Q4_K_M | Q5_K_M | Q8_0 | F16 |
|---|
No Python, no PyTorch, no CUDA toolkit. A single compiled binary that runs on macOS, Linux, and Windows. Metal on Apple Silicon, CUDA on NVIDIA, Vulkan everywhere else.
The standard format for quantized models. A single file contains weights, tokenizer, and metadata. Download one file, run it. Over 100,000 GGUF models on Hugging Face.
Every token is generated on your machine. No API calls, no data leaving your network. Critical for regulated industries: banking, healthcare, legal.