Open Benchmark Database

Tech-Practice
Benchmarks

Reproducible local AI performance data across consumer GPUs and Apple Silicon. Real hardware, real numbers, open data.

Devices
Models
Source
Loading...
Entries
Devices
Generation Speed
Tokens / Second
Prompt Evaluation
Tokens / Second

Head-to-Head

Compare
vs
Select two devices above to compare

All Benchmark Data

Device VRAM Model Test Speed (t/s) Quant Engine Video

Reproduce These Results

Copy & Run

      
Paste into your terminal. Requires cmake, git, and NVIDIA drivers (CUDA) or Xcode (Metal).

Test Methodology

Reproducible

01 Build

llama.cpp compiled from source with GGML_CUDA=ON for NVIDIA or GGML_METAL=ON for Apple Silicon. No wrappers, no Ollama — raw inference engine.

02 Measure

Two methods: live interactive chat for real-world feel, and llama-bench for formal measurement with 512 prompt tokens, 128 generation tokens, averaged over 3 runs.

03 Reproduce

All GGUF models, build commands, and test parameters are documented in the GitHub repo. Community contributions via Pull Request.