Comment by samrus

22 days ago

Exactly. I go back to a recent ancestor of LLMs, seq2seq. Its purpose was to translate things. Thats all. That needed representation learning and an attention mechanism, and it lead to some really freaky emergent capabilities, but its trained to trainslate language.

And thats exactly what its good for. It works great if you already solve a tough problem and provide it the solution in natural language, because the program is already there, it just needs to translate it to python.

Anything more than that that might emerge from this is going to be unreliable sleight of next-token-prediction at best.

We need a new architectural leap to have these things reason, maybe something that involves reinforcement learning at the token represention level, idk. But scaling the context window and training data arent going to cut it

0 comments

samrus

No comments yet

Contribute on Hacker News ↗