Comment by nialv7
6 days ago
Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?
6 days ago
Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?
No comments yet
Contribute on Hacker News ↗