Comment by wizzwizz4
4 days ago
Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.
No comments yet
Contribute on Hacker News ↗