Open Benchmark Database

Tech-Practice
Benchmarks

Reproducible local AI performance data across consumer GPUs and Apple Silicon. Real hardware, real numbers, open data.

Raw Data & CSV YouTube Channel

Devices

Models

Source

Entries

—

Devices

—

Generation Speed

Tokens / Second

Prompt Evaluation

Tokens / Second

Head-to-Head

Compare

Select two devices above to compare

All Benchmark Data

—

Device ▲	VRAM ▲	Model ▲	Test ▲	Speed (t/s) ▲	Quant ▲	Engine ▲	Video

Reproduce These Results

Copy & Run

Paste into your terminal. Requires cmake, git, and NVIDIA drivers (CUDA) or Xcode (Metal).

Test Methodology

Reproducible

01 Build

llama.cpp compiled from source with GGML_CUDA=ON for NVIDIA or GGML_METAL=ON for Apple Silicon. No wrappers, no Ollama — raw inference engine.

02 Measure

Two methods: live interactive chat for real-world feel, and llama-bench for formal measurement with 512 prompt tokens, 128 generation tokens, averaged over 3 runs.

03 Reproduce

All GGUF models, build commands, and test parameters are documented in the GitHub repo. Community contributions via Pull Request.