Comment by binary0010

1 day ago

Maybe try making a simple randomize script to swap the three latest models. And see if you can tell which ones are meaningfully different without knowing which ones are flipped on or off?

I find the quality ebbs and flows even on the same model. My guess it is something to do with GPU availability but only guessing.

  • Unless you're systematically repeating the exact same task, the most parsimonious explanation is that you're seeing natural variation based on different tasks, random sampling of tokens, etc.

    • I don't think this explains the phenomenon as is more temporal in nature - not prompt to prompt. I'm sure the AI labs gracefully degrade to simpler models when resources are low - why wouldn't they?