Comment by bobxmax
6 days ago
My suspicion is it's the personalization. Most people have things like 'memory' on, and as the models increasingly personalize towards you, that personalization is hurting quality rather than helping it.
Which is why the base model wouldn't necessarily show differences when you benchmarked them.
No comments yet
Contribute on Hacker News ↗