Comment by oofbey
3 days ago
It’s nonsensical to call it “zero shot” when a sample of the voice is provided. The term “zero shot cloning” implies you have some representation of the voice from another domain - e.g. a text description of the voice. What they’re doing is ABSOLUTELY one shot cloning. I don’t care if lots of STT folks use the term this way, they’re wrong.
No comments yet
Contribute on Hacker News ↗