Comment by sanchitmonga22
3 months ago
Understood, you want dictation, not a chatbot. That's a valid and different use case.
RCLI is Apple Silicon only today because MetalRT is built on Metal. For Linux, the closest thing to what you're describing would be building a virtual input device on top of Whisper or Parakeet (which RCLI supports as STT backends). Parakeet TDT 0.6B has ~1.9% WER, that's very close to production dictation quality.
The missing piece on Linux isn't the model, it's the integration: a daemon that captures mic audio, runs STT with hidden latency (streaming partial results), and injects text as keyboard input. sherpa-onnx (https://github.com/k2-fsa/sherpa-onnx) supports Linux and has streaming STT, it might be the best starting point for what your after.
We're focused on Apple Silicon for now but broader platform support is on the roadmap.
No comments yet
Contribute on Hacker News ↗