Comment by kadushka
2 days ago
Yesterday I asked 2.5 Pro, Opus 4, and o3 to convert my Pytorch script from pipeline parallel to regular DDP (convert one form of multi-GPU execution to another). None of the three produced fully correct code. Even when I put together the 3 different versions they produced, and gave it to each model again to analyze the differences, they still could not fully get it to work.
I don't know if o3 Pro would solve my task, but I feel we're still pretty far from the state where I'd struggle to give it a challenging enough problem.
That's not how you do it. Ask it first to create exhaustive tests around the first version. Tell it what to test for. Then, ask to change specific things, one at a time, re-run tests between the steps, and ask it to fix things. Rinse-repeat-review. It is faster than doing by hand, but you still need to be calling the shots.
I just did it myself in the end.
Good for you. OpenAI would rather you rely on it to solve your problems than your own intelligence.
I'm curious how you're prompting. I've performed this sort of dramatic update in both one-shot (Gemini 2.5/o3) and Leader/Agent (ask 2.5/o3 for a detailed roadmap) and then provide that to Claude to execute as an agent.
I find the key is being able to submit your entire codebase to the API as the context. I've only had one situation where the input tokens were beyond o3's limit. In most projects that I work with, a given module and all relevant modules clocks in around 50-100k tokens.
When calling via API, it also means you want to provide the full documentation for the task if it's a new API, etc. This is where the recent o3 price decrease is a godsend.
>I find the key is being able to submit your entire codebase to the API as the context
Am I the only person who works on proprietary code bases? This would get me fired.
You tried to one-shot it? Because context and access to troubleshooting tools is of utmost importance to get good results.