Comment by CuriouslyC

10 days ago

That's pre-training. Post training with RL can make models arbitrarily good at specific capabilities, and it's usually done via pooled human experts, so it's definitely not statistically mediocre.

The issue is that we're not modelling the problem, but a proxy for the problem. RL doesn't generalize very well as is, when you apply it to a loose proxy measure you get the abysmal data efficiency we see with LLMs. We might be able to brute-force "AGI" but we'd certainly do better with something more direct that generalizes better.

Maybe i'm misunderstanding your point, but human's have pretty abysmal data efficiency, too. We have to use tools for everything... ledgers, spreadsheets, data-bases, etc. It'll be the same for an AGI, there won't be any reason for it to remember every little detail, just be able to use the appropriate tool, as needed.