Comment by kadushka

2 days ago

Yesterday I asked 2.5 Pro, Opus 4, and o3 to convert my Pytorch script from pipeline parallel to regular DDP (convert one form of multi-GPU execution to another). None of the three produced fully correct code. Even when I put together the 3 different versions they produced, and gave it to each model again to analyze the differences, they still could not fully get it to work.

I don't know if o3 Pro would solve my task, but I feel we're still pretty far from the state where I'd struggle to give it a challenging enough problem.

6 comments

kadushka

sysmax 21 hours ago

That's not how you do it. Ask it first to create exhaustive tests around the first version. Tell it what to test for. Then, ask to change specific things, one at a time, re-run tests between the steps, and ask it to fix things. Rinse-repeat-review. It is faster than doing by hand, but you still need to be calling the shots.

kadushka 10 hours ago
I just did it myself in the end.
- jplusequalt 2 hours ago
  
  Good for you. OpenAI would rather you rely on it to solve your problems than your own intelligence.

dudeinhawaii 1 day ago

I'm curious how you're prompting. I've performed this sort of dramatic update in both one-shot (Gemini 2.5/o3) and Leader/Agent (ask 2.5/o3 for a detailed roadmap) and then provide that to Claude to execute as an agent.

I find the key is being able to submit your entire codebase to the API as the context. I've only had one situation where the input tokens were beyond o3's limit. In most projects that I work with, a given module and all relevant modules clocks in around 50-100k tokens.

When calling via API, it also means you want to provide the full documentation for the task if it's a new API, etc. This is where the recent o3 price decrease is a godsend.

jplusequalt 2 hours ago

>I find the key is being able to submit your entire codebase to the API as the context
Am I the only person who works on proprietary code bases? This would get me fired.

lifty 1 day ago

You tried to one-shot it? Because context and access to troubleshooting tools is of utmost importance to get good results.