← Back to context

Comment by lend000

3 days ago

I imagine after GPT-4 / o1, improvements on benchmarks have been increasingly a result of overfitting, because those breakthrough models already used most of the high quality training data that is available on the internet, there haven't been any dramatic architectural changes, we are already melting the world's GPUs, and there simply isn't enough new, high quality data being generated (orders of magnitudes more than what they already used on older models) to enable breakthrough improvements.

What I'd really like to see is the model development companies improving their guardrails so that they are less concerned about doing something offensive or controversial and more concerned about conveying their level of confidence in an answer, i.e. saying I don't know every once in a while. Once we get a couple years of relative stagnation in AI models, I suspect this will become a huge selling point and you will start getting "defense grade", B2B type models where accuracy is king.