Comment by throwaway27448

15 hours ago

Even at orders of magnitude greater speed, we've still hit diminishing returns for quality of output. We simply haven't found anything like superhuman reasoning ability, just superhuman (potentially) reasoning speed.

9 comments

throwaway27448

LarsDu88 13 hours ago

I disagree with this. Reinforcement learning with verifiable rewards training is actually the secret sauce that is leading Claude and GPT to automating software engineering tasks.

All the easily verifiable domains such as mathematics, coding, and things that can be run inside a reasonable simulation are falling very very fast.

By next year if not sooner, mathematicians will be wildly outpaced by LLMs for reasoning.

Alex_L_Wood 11 hours ago
Coding is anything but “easily” verifiable.
- LarsDu88 10 hours ago
  
  It's extremely verifiable. The reinforcement finetuning strategy I'm referring to involves LLM creating coding tasks with an expected output, implementing the code, and then having a compiler (or interpreter in the case of languages like python) succeed or fail to run the code. Then compare the output to expected output. The verification process (run interpreter + run test) can be done in seconds. One can generate millions of datasets like this for free and there is extensive research showing with the right policy, an agent will be able to learn to reason - first as good as human, and in many cases superior to a human.
  
  1 reply →

energy123 14 hours ago

It's not that easy to assess diminishing returns with saturated benchmarks where asymptoting to 100% is mathematically baked in. I could point to the number of Erdos proofs being solved by AI going from 0 to many very recently as evidence for acceleration.

throwaway27448 12 hours ago
That is not evidence of acceleration, just of some measurable improvement compared to a previous model. After all, humans have made these breakthroughs since before recorded history—that never by itself implied accelerating intelligence.
- energy123 7 minutes ago
  
  What would be evidence of acceleration? What would be evidence of diminishing returns? Both questions are hard to answer because it's difficult to avoid constructing a metric where the conclusion is already baked in.

horsawlarway 14 hours ago

Possibly - but we've also seen that spending more tokens on a task can improve the quality of the output (reasoning, CoT, etc).

So it's not impossible to have things that seem orthogonal, like generation speed or context length, have an impact on quality of result.