Reproducible local AI performance data across consumer GPUs and Apple Silicon. Real hardware, real numbers, open data.
| Device ▲ | VRAM ▲ | Model ▲ | Test ▲ | Speed (t/s) ▲ | Quant ▲ | Engine ▲ | Video |
|---|
llama.cpp compiled from source with GGML_CUDA=ON for NVIDIA or GGML_METAL=ON for Apple Silicon. No wrappers, no Ollama — raw inference engine.
Two methods: live interactive chat for real-world feel, and llama-bench for formal measurement with 512 prompt tokens, 128 generation tokens, averaged over 3 runs.
All GGUF models, build commands, and test parameters are documented in the GitHub repo. Community contributions via Pull Request.