Comment by comeonbro
5 months ago
> I've tested all of ours on each of the main models
Could you list them? I've noticed even quite techy people seem to be critically behind on what has happened in the last few months.
5 months ago
> I've tested all of ours on each of the main models
Could you list them? I've noticed even quite techy people seem to be critically behind on what has happened in the last few months.
Sure, as of today, I test on:
GPT: 4o, o1 pro mode, o3-mini-high
Gemini: 2.0 Flash, 2.0 Pro Experimental
Claude 3.5 Sonnet
Grok 3
DeepSeek-V3
Mistral: codestral 25.01, mistral-large 24.11
Qwen2.5-Max
---
If there are others I should try definitely open to suggestions.
And ruin the benchmark? Come on, bro.