← Back to context

Comment by simonw

2 days ago

Have you tried debugging using the new "reasoning" models yet?

I have been extremely impressed with o1, o3, o4-mini and Gemini 2.5 as debugging aids. The combination of long context input and their chain-of-thought means they can frequently help me figure out bugs that span several different layers of code.

I wrote about an early experiment with that here: https://simonwillison.net/2024/Sep/25/o1-preview-llm/

Here's a Gemini 2.5 Pro transcript from this afternoon where I'm trying to figure out a very tricky bug: https://gist.github.com/simonw/4e208ab9edb5e6a814d3d23d7570d...

In my experience they're not great with mathy code for example. I had a function that did subdivision of certain splines and had some of the coefficients wrong. I pasted my function into these reasoning models and asked "does this look right?" and they all had a whole bunch of math formulas in their reasoning and said "this is correct" (which it wasn't).