Comment by dataviz1000

6 hours ago

I know why.

Several people had problems with Sonnet burning through all their credits grinding on a problem it can't solve. Opus fixes this — it has a confidence threshold below which it exits the task instead of grinding.

"I spent ~$100 last week testing both against multiplication. Sonnet at 37-digit × 37-digit (~10³⁷) never quits — 15+ minutes, 211KB of output, still actively decomposing numbers when I stopped it. Opus will genuinely attempt up to ~50 digits (112K tokens on a real try), starts doubting around 55 digits, and by 80-digit × 80-digit surrenders in 330 tokens / 9 seconds with an empty answer." -- Opus, helping me with the data

The "I don't think this is worth attempting" heuristic is the difference. Sonnet doesn't have it, or has it set much higher. In order to get Opus and some other models to work on harder problems that it assumes it is not worth attempting, it requires an increase of confidence level.

I'll finish writing this up this week. I'm making flashy data visual animations to make the point right now.

1 comment

dataviz1000

bonesss 2 hours ago

So we have a bunch of imposter syndrome techies who would 5x with just a hint of encouragement, and now we’re trying to 2x them with LLMs, but in order to get there those same techies will have to gas up and inspire the LLM with the leadership and vision they themselves are wanting for?

The Universe seems against free lunches… if AGI is possible, finding a manager good enough to get the AGI to update its timesheet will not be (in practice).