Comment by ricardobeat
6 days ago
The samples featured elsewhere seem to be from a larger model?
After testing this locally, it still sounds quite mechanical, and fails catastrophically for simple phrases with numbers ("easy as 1-2-3"). If the 80M model can improve on this and keep the expressiveness seen in the reddit post, that looks promising.
No comments yet
Contribute on Hacker News ↗