Comment by kelseyfrog

4 months ago

The point is to bootstrap self-improving AI. Once a measurement becomes a goal, model makers target saturating it.

There is a coefficient of intelligence replication ie: Model M with intelligence I_m, can reproduce a model N with intelligence I_n. When (I_n / I_m) > 1 we'll have a runaway intelligence explosion. There are of course several elements in the chain - akin to the Drake equation for intelligent machines - and their combined multiplicative effect determines the overall intelligence of the system. If f(paper) -> code is the weakest part of the chain, it makes sense to target that.

5 comments

kelseyfrog

riku_iki 4 months ago

> If f(paper) -> code is the weakest part of the chain, it makes sense to target that.

my point is that LLMs are already potentially seeing solution on github, so you can't use that benchmark as metric unless there is some explanation.

kelseyfrog 4 months ago
How does that work with knowledge cutoff?
- riku_iki 4 months ago
  
  It could work with knowledge cut off if they can reliably guarantee it, and also make sure LLMs are not searching github under the surface.
  
  2 replies →