Comment by a_bonobo

6 days ago

I find the Konwinski Prize to be very interesting in this context. 1 million dollars to whoever's open source LLM solves >90% of a set of novel Github issues.

https://www.kaggle.com/competitions/konwinski-prize/

Currently, the #1 spot sits at a score of 0.09, not 0.9. A far cry from being useful. I know that open source models are not as good as closed source, but still, we're a long way from LLMs being good for code on their own.

And that supports OP's point - these tools aren't AGI, they produce trash that needs evaluation, but they're still useful.

12 comments

a_bonobo

virgildotcodes 6 days ago

Am I misunderstanding or are the models also limited to those that can be run with less than 96 gigs of VRAM?

The models that are both open source and quantized so that they can fit within that much memory are going to be significantly less capable than full scale frontier closed source models, I wonder how the latter would perform.

naasking 6 days ago

> Currently, the #1 spot sits at a score of 0.09, not 0.9. A far cry from being useful.

The best intellisense and code completion tools would solve 0.00. Those were the only tools we were using just a couple of years ago. 0.09 is a tremendous jump and the improvements will accelerate!

kortilla 6 days ago
Assuming acceleration or even continued improvement is pretty naive.
- naasking 5 days ago
  
  You mean assuming technology improves, which it has for centuries and given there is considerable incentive for it to improve, is naive? Seems inevitable.
  Do you think humans have achieved peak intelligence? If so, why, and if not, then why shouldn't you expect artificial forms of intelligence to improve up to and even surpass humans abilities at some point?
  Edit: to clarify, I'm not necessarily assuming unbounded acceleration, but tools always start out middling, improvements accelerate as we figure out what works and what doesn't, and then they taper off. We're just starting on the acceleration curve for AI.
  
  3 replies →

jachee 6 days ago

They’re tab-completion with extra cognitive-load steps.

a_bonobo 6 days ago
I mean, if you can solve 9% of Github issues automatically that's a fairly huge load of work you can automate. Then again you'd have to manually identify which 9% of issues.
- blibble 6 days ago
  
  "update dependencies"
  that would probably cover it, and you don't need "AI" to do that
  
  1 reply →