Comment by miki123211
1 day ago
Kokoro just proves my point; it's "one guy in a garage", 1000 hours of distilled audio (I think) and ~100m params.
With the budget one tenth that of Stable Diffusion and less ethical qualms, you could easily 10x or 100x this.
No comments yet
Contribute on Hacker News ↗