Comment by empiko
1 year ago
I have seen a bunch of tokenization papers with various ideas but their results are mostly meh. I personally don't see anything principally wrong with current approaches. Having discrete symbols is how natural language works, and this might be an okayish approximation.
No comments yet
Contribute on Hacker News ↗