Comment by bobxmax

8 months ago

My suspicion is it's the personalization. Most people have things like 'memory' on, and as the models increasingly personalize towards you, that personalization is hurting quality rather than helping it.

Which is why the base model wouldn't necessarily show differences when you benchmarked them.