Comment by pzo

10 months ago

Kokoro gives great results especially when speaking english. Model is small enough to run even on smartphone ~3x faster than realtime.

3 comments

pzo

Kokoro just proves my point; it's "one guy in a garage", 1000 hours of distilled audio (I think) and ~100m params.

With the budget one tenth that of Stable Diffusion and less ethical qualms, you could easily 10x or 100x this.

cchance 10 months ago

I'm actually surprised people aren't just using elevenreader to generate solid content from various books for datasets lol

Another +1 to Kokoro from me, great quality with good speed.