Comment by kortilla

4 days ago

Performance on these scores frequently hits plateaus because the style of technology is just unfit for the task.

We are quite far into the development cycle of LLMs. Literally billions of dollars have been poured into them. The rate of improvements over the last 6-12 months has slowed, not accelerated.

There hasn’t been any hint on AGI breakthroughs, so we’re dealing with the tools to help herd stochastic parrots (i.e. agents) for the foreseeable future. And those tools are to just help with how much LLMs hallucinate, it doesn’t make them more creative in a way to improve these scores.

> We are quite far into the development cycle of LLMs.

No, we've barely scratched the surface. Billions of dollars have been poured into the stupidest possible thing that could work + scaling, and we're only now trying more clever things. Fine-tuning on specific tasks will yield considerable productivity benefits in those domains.

I'm not only skeptical of your claim on the "rate of improvements over the last 6-12 months", but it's not even a compelling time horizon to infer any kind of trend at this stage.

  • The needle hasn’t meaningfully moved on these types of tasks though. There was a class of problems that LLMs could nearly immediately help with that brought us quickly to the current rate. But then we stalled and it hasn’t meaningfully moved with any of the major releases