Comment by mbrock
1 year ago
It is mistaken because it has no particular insight into its own implementation. In fact the whole point is that it directly consumes and produces audio tokens with no text. That's why it's able to sing, make noises, do accents, and so on.
No comments yet
Contribute on Hacker News ↗