Comment by comeonbro

5 months ago

> I've tested all of ours on each of the main models

Could you list them? I've noticed even quite techy people seem to be critically behind on what has happened in the last few months.

2 comments

comeonbro

Sure, as of today, I test on:

GPT: 4o, o1 pro mode, o3-mini-high

Gemini: 2.0 Flash, 2.0 Pro Experimental

Claude 3.5 Sonnet

Grok 3

DeepSeek-V3

Mistral: codestral 25.01, mistral-large 24.11

Qwen2.5-Max

---

If there are others I should try definitely open to suggestions.

And ruin the benchmark? Come on, bro.