Coming Soon

Official Benchmarks

Engine Arena Rankings

The canonical leaderboard for all offline speech-to-text models tested in the VoxBar engine arena.

Test Suite Information

The 5 test files used to benchmark the transcription models across different scenarios.

Test Content What It Stresses
T1 Reading a List (Motivational) Numbering, punctuation, structured content
T2 Lecture (Varoufakis — Economics) Numbers, percentages, proper nouns, complex sentences
T3 Podcast (Dawkins on Darwin) Philosophy vocab, nested clauses, proper nouns
T4 Accent 1 (Joscha Bach — German) Dense philosophy, accented speech, "simulacrum"
T5 Accent 2 (Daniel Dennett — American) Narrative speech, "hallucinatory", stream-of-consciousness

System Audio Rankings

Testing how well models transcribe internal computer audio (lectures, podcasts, videos) without mic interference.

Rank Model (Size) T1: List T2: Lecture T3: Podcast T4: Accent 1 T5: Accent 2 AVG
🥇 VoxBar Voxtral 4B (14GB VRAM) 9.5 9.5 10.0 10.0 9.5 9.7
🥈 VoxBar Pro Native F16 (8.5GB VRAM) 8.5 10.0 10.0 10.0 9.0 9.5
🥉 VoxBar Kyutai 2.6B (6GB VRAM) 9.5 9.5 9.5 10.0 8.5 9.4
4️⃣ VoxBar Nemotron 0.6B (2GB VRAM) 8.5 8.0 9.5 9.0 8.5 8.7
5️⃣ VoxBar GLM-ASR 1.5B (4GB VRAM) 8.5 8.0 9.0 9.0 8.5 8.6
6️⃣ VoxBar Canary 2.5B (4GB VRAM) 7.5 8.5 8.0 9.5 8.5 8.4
6️⃣ VoxBar Kyutai 1B (2.7GB VRAM) 9.0 7.5 8.5 9.0 8.0 8.4
8️⃣ VoxBar Distil-Whisper V3 (4GB VRAM) 6.5 7.5 7.0 8.5 8.0 7.5
9️⃣ VoxBar Qwen ASR 1.7B (4.5GB VRAM) 7.0 7.5 7.5 6.5 6.0 6.9

Microphone Audio Rankings

Testing transcription accuracy from a condenser microphone (WASAPI loopback) handling room acoustics and breathing.

Rank Model (Size) T1: List T2: Lecture T3: Podcast T4: Accent 1 T5: Accent 2 AVG
🥇 VoxBar Pro Native F16 (8.5GB VRAM) 9.5 9.5 10.0 10.0 9.0 9.6
🥈 VoxBar Voxtral 4B (14GB VRAM) 9.5 9.5 10.0 10.0 8.5 9.5
🥉 VoxBar Kyutai 2.6B (6GB VRAM) 9.5 9.5 9.5 10.0 8.5 9.4
4️⃣ VoxBar Canary 2.5B (4GB VRAM) 8.5 8.5 7.5 9.0 8.5 8.4
5️⃣ VoxBar Nemotron 0.6B (2GB VRAM) 9.0 7.0 8.0 8.5 7.5 8.0
6️⃣ VoxBar Kyutai 1B (2.7GB VRAM) 9.5 7.0 7.5 7.0 8.0 7.8
7️⃣ VoxBar GLM-ASR 1.5B (4GB VRAM) 7.0 6.5 8.5 8.0 7.0 7.4
8️⃣ VoxBar Qwen ASR 1.7B (4.5GB VRAM) 5.0 6.5 6.5 4.5 7.0 5.9
9️⃣ VoxBar Distil-Whisper V3 (4GB VRAM) 6.0 5.5 5.5 6.0 6.0 5.8

Combined Leaderboard

The ultimate average score across all 10 tests.

Rank Model (Size) Sys AVG Mic AVG Combined Gap to #1
🥇 VoxBar Voxtral 4B (14GB VRAM) 9.7 9.5 9.6
🥈 VoxBar Pro Native F16 (8.5GB VRAM) 9.5 9.6 9.55 -0.05
🥉 VoxBar Kyutai 2.6B (6GB VRAM) 9.4 9.4 9.4 -0.2
4️⃣ VoxBar Canary 2.5B (4GB VRAM) 8.4 8.4 8.4 -1.2
5️⃣ VoxBar Nemotron 0.6B (2GB VRAM) 8.7 8.0 8.35 -1.25
6️⃣ VoxBar Kyutai 1B (2.7GB VRAM) 8.4 7.8 8.1 -1.5
7️⃣ VoxBar GLM-ASR 1.5B (4GB VRAM) 8.6 7.4 8.0 -1.6
8️⃣ VoxBar Distil-Whisper V3 (4GB VRAM) 7.5 5.8 6.65 -2.95
9️⃣ VoxBar Qwen ASR 1.7B (4.5GB VRAM) 6.9 5.9 6.4 -3.2