Choose from 16 offline, privacy-first AI transcription models optimized for Windows. From zero-VRAM CPU models to high-accuracy multi-billion parameter GPU behemoths.
Voxtral Real-Time 4B
The flagship offline real-time model.
Voxtral Real-Time 4B (native)
Native HuggingFace pipeline implementation.
Parakeet TDT 0.6B v2
High-speed Parakeet model for exceptional accuracy.
Moonshine v2
CPU-only model requiring zero VRAM.
Faster-Whisper (CTranslate2)
The classic Whisper engine optimized with CTranslate2.
Distil-Whisper
Distilled Whisper for faster execution.
Faster-Whisper Large
Maximum accuracy Whisper variant.
Nemotron Speech ASR 0.6B
NVIDIA Nemotron Speech model.
VoxBar Kyutai 2.6B (6GB VRAM)
Meeting and recording transcription (~4s delay).
Qwen3-ASR 0.6B
Lightweight Qwen3 0.6B variant.