Comment by anotherpaulg

1 year ago

OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s coding benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models.

12 comments

anotherpaulg

memothon 1 year ago

Thanks again for running all these benchmarks with model releases. They are really helpful to track progress!

CGamesPlay 1 year ago

Really appreciate the thoroughness you apply to evaluating models for use with Aider. Did you adjust the prompt at all for the newer models?

jimmywetnips 1 year ago

I've definitely run into this personally. But even even I explicitly tell it to not skip implementation and to generate fully functional code, it says that it understands but continues right into omitting things again.

It was honestly shocking because we're so used to it understanding our commands that a blatant disregard like that made me seriously wonder what kind of laziness layer they added

lyu07282 1 year ago
I suspect they might be worried it could reproduce copyrighted code in certain circumstances, so their solution was to condition the model to never produce large continuous chunks of code. It was a very noticeable change across the board.
- s-lambert 1 year ago
  
  I thought it would be for performance, since it doesn't output all of the code then each reply is shorter/quicker. Although you can still ask it to generate more of the code but that introduces some latency so there's less overall load.
satvikpendem 1 year ago
People hypothesized that OpenAI added laziness in order to save money on token generation, since they are burning through GPU time.
- joenot443 1 year ago
  
  This has been my conclusion too. Given it’s a product I’m paying monthly for it seems super regressive to have to trick it into doing what it used to do just fine.
  
  1 reply →
- whywhywhywhy 1 year ago
  
  I'd probably pay triple to go back to the pre-"Dev Day" product at this point
  
  1 reply →
_giorgio_ 1 year ago

They should offer different models at this point.
This laziness occurs over and over, so why bother with omniscience.
j45 1 year ago

The laziness layer seems to be to be an assistant but not a replacement or doing the task.