Comment by davidkunz

9 hours ago

I'm not an expert. Can't we abuse that LLMs don't need to receive audio as a continuous stream without interruptions? Couldn't we just send data and pipe it into the LLM with deduplication (if resending happens)?

  x...y...y[dedup]...z

2 comments

davidkunz

vlovich123 3 minutes ago

Audio -> ASR - no jitter buffer TTS -> human - jitter buffer

shwaj 8 hours ago

You’re absolutely correct. A jitter buffer is necessary for a human listener, but a LLM isn’t aware of a time lapse, just like it isn’t aware of the time since your last message in the conversion (unless the chat harness explicitly informs it).