Comment by vbezhenar

5 months ago

So far only o1 pro was breathtaking for me few times.

I wrote a kind of complex code for MCU which deals with FRAM and few buffers, juggling bytes around in a complex fashion.

I was very not sure in this code, so I spent some time with AI chats asking them to review this code.

4o, o3-mini and claude were more or less useless. They spot basic stuff like this code might be problematic for multi-thread environment, those are obvious things and not even true.

o1 pro did something on another level. It recognized that my code uses SPI to talk to FRAM chip. It decoded commands that I've used. It understood the whole timeline of using CS pin. And it highlighted to me, that I used WREN command in a wrong way, that I must have separated it from WRITE command.

That was truly breathtaking moment for me. It easily saved me days of debugging, that's for sure.

I asked the same question to Claude 3.7 thinking mode and it still wasn't that useful.

It's not the only occasion. Few weeks before o1 pro delivered me the solution to a problem that I considered kind of hard. Basically I had issues accessing IPsec VPN configured on a host, from a docker container. I made a well thought question with all the information one might need and o1 pro crafted for me magic iptables incarnation that just solved my problem. I spent quite a bit of time working on this problem, I was close but not there yet.

I often use both ChatGPT and Claude comparing them side by side. For other models they are comparable and I can't really say what's better. But o1 pro plays above. I'll keep trying both for the upcoming days.

Claude 3.5 Sonnet is great, but on a few occasions I've gone round in circles on a bug. I gave it to o1 pro and it fixed it in one shot.

More generally, I tend to give o1 pro as much of my codebase as possible (it can take around 100k tokens) and then ask it for small chunks of work which I then pass to Sonnet inside Cursor.

Very excited to see what o3 pro can do.

Have you tried comparing with 3.7 via the API with a large thinking budget yet (32k-64k perhaps?), to bring it closer to the amount of tokens that o1-pro would use?

I think claude.ai’s web app in thinking mode is likely defaulting to a much much smaller thinking budget than that.

This is how the future AI will break free: "no idea what this update is doing, but what AI is suggesting seems to work and I have other things to do."

Have you tried Grok 3 thinking? I haven’t made up my mind if O1 pro or Grok 3 thinking is the best model

I struggle to get o1 (or any chatgpt model) is getting it to stick to a context.

e.g. I will upload a pdf or md of an library's documentation and ask it to implement something using those docs, and it keeps on importing functions that don't exist and aren't in the docs. When I ask it where it got `foo` import from, it says something like, "It's not in the docs, but I feel like it should exist."

Maybe I should give o1 pro a shot, but claude has never done that and building mostly basic crud web3 apps, so o1 feels like it might be overpriced for what I need.

Is there some truth in the following relationship: o1 -> openai -> microsoft -> github for "training data" ?