← Back to context

Comment by doonielk

12 hours ago

I did a MLX "streaming ASR + LLM + streaming TTS" pipeline in early 2024. I haven't worked on it since then so it's dated. There are now better versions of all the models I used.

I was able to conversational latency with the ability to interrupt the pipeline on a Mac, using a variety of tricks. It's MLX, so only relevant if you have a Mac.

https://github.com/andrewgph/local_voice

For MLX speech to speech, I've seen:

The mlx-audio package has some MLX implementations of speech to speech models: https://github.com/Blaizzy/mlx-audio/tree/main

kyutai Moshi, maybe old now but has a MLX implementation of their speech to speech model: https://github.com/kyutai-labs/moshi