Comment by PhilippGille

1 month ago

OP asked:

> Is anyone doing true end-to-end speech models locally (streaming audio out), or is the SOTA still “streaming ASR + LLM + streaming TTS” glued together?