← Back to context

Comment by ripped_britches

6 months ago

I think “getting beat handily” is a HN bubble concept. Depends on what you’re using it for, but I personally prefer 4o for coding. In enterprise usage, i think 4o is smoking 3.5 sonnet, but that’s just my perception from folks I talk to.

I don't think that's true, you'll get the same sentiment ("Sonnet 3.5 is much better than GPT4/GPT4o [for coding]") pretty uniformly across Reddit/HN/Lobsters. I would strongly agree with it in my own testing, although o1 might be much better (I'm too poor to give it a fair shake.)

> In enterprise usage, i think 4o is smoking 3.5 sonnet

True. I'm not sure how many enterprise solutions have given their users an opportunity to test Claude vs. GPT. Most people just use whatever LLM API their software integrates.

This just isn't accurate, on the overwhelming majority of real-world tasks (>90%) 3.5 Sonnet beats 4o. FWIW I've spoken with a friend who's at OpenAI and they fully agree in private.