Comment by doonielk
13 hours ago
I did a MLX "streaming ASR + LLM + streaming TTS" pipeline in early 2024. I haven't worked on it since then so it's dated. There are now better versions of all the models I used.
I was able to conversational latency with the ability to interrupt the pipeline on a Mac, using a variety of tricks. It's MLX, so only relevant if you have a Mac.
https://github.com/andrewgph/local_voice
For MLX speech to speech, I've seen:
The mlx-audio package has some MLX implementations of speech to speech models: https://github.com/Blaizzy/mlx-audio/tree/main
kyutai Moshi, maybe old now but has a MLX implementation of their speech to speech model: https://github.com/kyutai-labs/moshi
No comments yet
Contribute on Hacker News ↗