Comment by sponnath

1 day ago

I think it's easy to ignore all the times the models get things hilariously wrong when there's a few instances where its output really surprises you.

That said, I don't really agree with the GP comment. Humans are the bottleneck if we knew these models get things right 100% of the time but with a model like o3-pro it's very possible it'll just spend 20 minutes chasing down the wrong rabbit hole. I've often found prompting o4-mini gave me results that were pretty good most of the time while being much faster whereas with base o3 I usually have to wait 2-3 minutes and hope that it got things right and didn't make any incorrect assumptions.