Comment by nowittyusername
1 month ago
Supertonic is probably way faster then that, I wouldn't be surprised if measured it would be something like 14k wpm. On my 4090 I was getting about 175x real time while on cpu only it was 55x realtime. I stopped optimizing it but im sure it could be pushed further. Anyways you should check out their repo to test it yourself its crazy what that team accomplished!
Audio synthesis speed is one thing, but is the output _intelligible to a human_ at 1,000wpm? That's the sort of thing Eloquence is being used for, according to the article.
TTS has no intelligence bud. Its only something that transforms text to audio. And that is all that we are talking about here. neither the article or anyone else was discussing the whole stt > llm > tts pipeline.
https://www.merriam-webster.com/dictionary/intelligible
Did you even read the article bud