← Back to context

Comment by Terretta

1 year ago

Appreciate OpenAI popped in say new release is probably better at something else, but it would have been nice to acknowledge that this suggestion...

> “Unfortunately, if you're a developer using an LLM API, the best thing to do is to test all of the models from all the providers to see which works best for your use case.”

...is exactly what is done by the author of these benchmark suites:

"It performs worse on aider’s coding benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models."

Agreed! Kudos to Paul for creating the evals, running them quickly, and sharing results. My comment (not on behalf on OpenAI, but just me as an individual) was meant as a "yes and" not a "no but".