Comment by dqv

1 month ago

Does having it sound "natural" even matter for high-speed reading? I assumed it would be a hindrance at higher speeds because natural variation and randomness in a voice makes it harder to scan the voice (similar to how reading something handwritten tends to be harder than something that has been typeset). At least that's how I always feel whenever I listen to audiobooks that use "natural" voices - I always switch to the more robotic sounding ones because, in my experience, it's easier to scan once at 2x and beyond.

My takeaway from the article is that accuracy of pronunciation, tweakability, and "time to first utterance" are what matter most.

You are correct. At least in my case, more synthetic voices like Eloquence are easier to understand at high speeds especially because of their 'formulaic' nature. You don't listen to each individual phoneme or letter, you listen more for groups of syllables, tone, etc. The more unpredictable the text to speech, the harder this is. Also, performance is another big point. If you have large bits of silence at the beginning of the audio, or slow attacks, then the responsiveness will suffer, whether that's because of the actual audio itself, or the generation time.

Some of this is surely ssubjective, but I'm pretty sure I'm not the only screen reader user with these opinions.