Comment by nicktikhonov

18 hours ago

100% - I thought about that shortly after writing this up. One way to make this work is to have a tiny, lower latency model generate that first reply out of a set of options, then aggressively cache TTS responses to get the latency super low. Responses like "Hmm, let me think about that..." would be served within milliseconds.

3 comments

nicktikhonov

dotancohen 16 hours ago

Years ago I wrote a system that would generate Lucene queries on the fly and return results. The ~250 ms response time was deemed too long, so I added some information about where the response data originated, and started returning "According to..." within 50 ms of the end of user input. So the actual information got to the user after a longer delay, but it felt almost as fast as conversion.

eru 13 hours ago

See also any public speaking who starts every answer to a question from the audience (or in a verbal interview) with something like 'that is a good question!' or "thank you for asking me that!"

Same strategy but employed by humans.

digitallyamar 2 hours ago

"You are absolutely right!"