Comment by killerstorm

6 months ago

I'm curious why smallish TTS models have metallic voice quality.

The pronunciation sounds about right - i thought it's the hard part. And the model does it well. But voice timbre should be simpler to fix? Like, a simple FIR might improve it?

2 comments

killerstorm

nickpsecurity 6 months ago

We change our tone based on personal style, emotion, context, and other factors. An accurate generator might need to encode all that information in the model. It will be larger than a model that doesn't do all of that.

codedokode 6 months ago

Probably "metallicity" is due to lack of details and cannot be fixed that easy.