Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

by Curiositryon 2/10/2026, 1:17 AMwith 35 comments

by d4rkp4tternon 2/10/2026, 1:07 PM

I use the open source Handy [1] app with Parakeet V3 for STT when talking to coding agents and I’ve yet to see anything that beats this setup in terms of speed/accuracy. I get near instant transcription, and the slight accuracy drop is immaterial when talking to AIs that can “read between the lines”.

I tried incorporating this Voxtral C implementation into Handy but got very slow transcriptions on my M1 Max MacBook 64GB.

[1] https://github.com/cjpais/Handy

I’ll have to try the other implementations mentioned here.

by mythzon 2/10/2026, 9:52 AM

Big fan of Salvatore's voxtral.c and flux2.c projects - hope they continue to get optimized as it'd be great to have lean options without external deps. Unfortunately it's currently too slow for real-world use (AMD 7800X3D/Blas) when adding Voice Input support to llms-py [1].

In the end Omarchy's new support for voxtype.io provided the nicest UX, followed by Whisper.cpp, and despite being slower, OpenAI's Whisper is still a solid local transcription option.

Also very impressed with both the performance and price of Mistral's new Voxtral Transcription API [2] - really fast/instant and really cheap ($0.003/min), IMO best option in CPU/disk-constrained environments.

[1] https://llmspy.org/docs/features/voice-input

[2] https://docs.mistral.ai/models/voxtral-mini-transcribe-26-02

by Curiositryon 2/10/2026, 3:45 AM

This was a breeze to install on Linux. However, I haven't managed to get realtime transcription working yet, ala Whisper.cpp stream or Moonshine.

--from-mic only supports Mac. I'm able to capture audio with ffmpeg, but adapting the ffmpeg example to use mic capture hasn't worked yet:

ffmpeg -f pulse -channels 1 -i 1 -f s16le - 2>/dev/null | ./voxtral -d voxtral-model --stdin

It's possible my system is simply under spec for the default model.

I'd like to be able to use this with the voxtral-q4.gguf quantized model from here: https://huggingface.co/TrevorJS/voxtral-mini-realtime-gguf

by written-beyondon 2/10/2026, 8:18 AM

Funny, this and the Rust runtime implementation are neck and neck on the frontpage right now.

Cool project!

by hrpnkon 2/10/2026, 9:50 AM

There is also a MLX implementation: https://github.com/awni/voxmlx

by sgton 2/10/2026, 8:05 AM

I'm very interested in speech to text - but like tricky dialects and use of various terminologies but I'm still confused as to where to start in the best possible place, in order to train the models with a huge database of voice samples I own.

Any ideas from the HN crowd currently involved in speech 2 text models?

by BugsJustFindMeon 2/11/2026, 5:14 PM

The title here says CPU only, but that's wrong. The repo clearly says it has GPU acceleration and doesn't make any claims about CPUness.

by ks2048on 2/10/2026, 6:31 PM

Should this work on a 16GB M3 MacBook Pro? It starts to load, but hangs or is too slow.

by sylwareon 2/10/2026, 11:20 AM

Finally a plain and simple C lib to run LLM opened weights?

by 9999_pointson 2/10/2026, 4:42 PM

It seems so bizarre that we need a nearly 9gb model to do something you could do over 20 years ago with ~200mb.

by alextray812on 2/10/2026, 12:29 PM

From a cybersecurity perspective, this project is impressive not just for performance, but for transparency.