Comment by bugglebeetle

6 months ago

I used to think this, but using o1 quite a bit lately has convinced me otherwise. It’s been 1-shotting the fairly non-trivial coding problems I throw at it and is good about outputting large, complete code blocks. By contrast, Claude immediately starts nagging you about hitting usage limits after a few back and forth and has some kind of hack in place to start abbreviating code when conversations get too long, even when explicitly instructed to do otherwise. I would imagine that Anthropic can produce a good test time compute model as well, but until they have something publicly available, OpenAI has stolen back the lead.

1 comment

bugglebeetle

maeil 6 months ago

"Their model" here is referring to 4o as o1 is unviable for many production usecases due to latency.