Comment by hosaka

12 hours ago

Depending on the TTS model being used latency can be reduced further yet with an LRU cache, fetching common phrases from cache instead of generating fresh with TTS.

However the naturalness of how it sounds will depend on how the TTS model works and whether two identical chunks of text will sound alike every generation.

1 comment

hosaka

nicktikhonov 7 hours ago

Yep. Seems like caching more broadly is something worth exploring next if I were to do a pt2.