← Back to context

Comment by woodson

8 hours ago

EDIT: My bad, please disregard; As akreal pointed out, the MMS TTS models aren’t using the SSL models.

Original post:

You can use the OmniASR SSL models instead of their older MMS models to create TTS models: https://github.com/ylacombe/finetune-hf-vits

As far as I understand, the MMS TTS models are trained from scratch (section 7.1 of [1]), they do not employ any SSL models. So the OmniASR SSL models are not useful here.

What might be interesting is the newly released OmniASR data, because the MMS data, which was used for the MMS TTS, was never released.

Also, the OmniASR can be used to transcribe some untranscribed speech to train a TTS on it.

[1] MMS paper: https://arxiv.org/pdf/2305.13516