I think it's easy to ignore all the times the models get things hilariously wrong when there's a few instances where its output really surprises you.
That said, I don't really agree with the GP comment. Humans are the bottleneck if we knew these models get things right 100% of the time but with a model like o3-pro it's very possible it'll just spend 20 minutes chasing down the wrong rabbit hole. I've often found prompting o4-mini gave me results that were pretty good most of the time while being much faster whereas with base o3 I usually have to wait 2-3 minutes and hope that it got things right and didn't make any incorrect assumptions.
I find LLMs to be useful, but my day to day usage of them doesn't fit the narrative of people who suggest they are creating massive complex projects with ease.
And if they are, where's the actual output proof? Why don't we see obvious evidence of some massive AI-powered renaissance, and instead just see a never ending stream of anecdotes that read like astroturf marketing of AI companies?
I think it's easy to ignore all the times the models get things hilariously wrong when there's a few instances where its output really surprises you.
That said, I don't really agree with the GP comment. Humans are the bottleneck if we knew these models get things right 100% of the time but with a model like o3-pro it's very possible it'll just spend 20 minutes chasing down the wrong rabbit hole. I've often found prompting o4-mini gave me results that were pretty good most of the time while being much faster whereas with base o3 I usually have to wait 2-3 minutes and hope that it got things right and didn't make any incorrect assumptions.
Same.
I find LLMs to be useful, but my day to day usage of them doesn't fit the narrative of people who suggest they are creating massive complex projects with ease.
And if they are, where's the actual output proof? Why don't we see obvious evidence of some massive AI-powered renaissance, and instead just see a never ending stream of anecdotes that read like astroturf marketing of AI companies?
Speaking of which, astroturfing seem like the kind of task LLMs should excel at…