Comment by emsi

1 year ago

OK, this is wild. I just saw o3-mini (regular) to precisely simulate (calculate?) output of quite complicated computations. Well, at least for a human… and no, it didn’t use code interpreter.

How do you know it didn't use a code interpreter if they don't share the chain-of-thought?

  • When Code Interpeter is used on ChatGPT OpenAI make it very clear that it is being used through UI hints.

    I really hope they don't ever change that UI pattern, this stuff is hard enough to understand already.

    If you really want to test this, you can take advantage of the fact that Code Interpeter runs in a persistent sandbox VM. Tell the o3-mini prompt to save a file, then switch to GPT-4o (which can use Code Interpreter for real) and have it run Python code to show if that file exists or not.

I was trying to solve this simple beam deflection problem and been getting inconsistent results in various models (O1 mini and Gemini 2.0 flash thinking experimental) between different runs. Do you get consistent deflection numbers?

> An 6061-T6 aluminum alloy hollow round 2 in diameter beam with 0.125 in thickness and length 120 in is simply supported at each end. A point load of 100 lb is applied at the middle. What is the deflect in the middle and 12 in from the ends.