Comment by NitpickLawyer

6 hours ago

> Prove this statement wrong.

If all it takes is "trained on the Internet" and "decompress stored knowledge", then surely gpt3, 3.5, 4, 4.1, 4o, o1, o3, o4, 5, 5.1, 5.x should have been able to do it, right? Claude 2, 3, 4, 4.1, 4.5? Surely.

6 comments

NitpickLawyer

shakna 4 hours ago

Well, "Reimplement the c4 compiler - C in four functions" is absolutely something older models can do. Because most are trained, on that quite small product - its 20kb.

But reimplementing that isn't impressive, because its not a clean room implementation if you trained on that data, to make the model that regurgitates the effort.

signatoremo 2 hours ago

> Well, "Reimplement the c4 compiler - C in four functions" is absolutely something older models can do.
Are you sure about that? Do you have some examples? The older Claude models can’t do it according to TFA.

gmueckl 4 hours ago

This comparison is only meaningful with comparable numbers of parameters and context window tokens. And then it would mainly test the efficiency and accuracy of the information encoding. I would argue that this is the main improvement over all model generations.

hn_acc1 4 hours ago

Are you really asking for "all the previous versions were implemented so poorly they couldn't even do this simple, basic LLM task"?

Philpax 3 hours ago

Please look at the source code and tell me how this is a "simple, basic LLM task".

geraneum 6 hours ago

Perhaps 4.5 could also do it? We don’t know really until we try. I don’t trust the marketing material as much. The fact that the previous version (smaller versions) couldn’t or could do it does not really disprove that claim.