← Back to context Comment by CubsFan1060 1 day ago Great post last night from Simon: https://simonwillison.net/2026/Apr/27/vibevoice/ 2 comments CubsFan1060 Reply 542458 1 day ago Note that this just covers the Speech-to-Text/Speech-Recognition aspect (a-la whisper), there's also models for long-form Text-To-Speech and steaming Text-To-Speech. JumpCrisscross 1 day ago “VibeVoice can only handle up to an hour of audio”Why?
542458 1 day ago Note that this just covers the Speech-to-Text/Speech-Recognition aspect (a-la whisper), there's also models for long-form Text-To-Speech and steaming Text-To-Speech.
Note that this just covers the Speech-to-Text/Speech-Recognition aspect (a-la whisper), there's also models for long-form Text-To-Speech and steaming Text-To-Speech.
“VibeVoice can only handle up to an hour of audio”
Why?