Comment by soulofmischief
7 hours ago
Interesting, thanks for sharing!
As for the web stuff, these tools are great in the hands of thoughtful, attentive, experienced engineers who have developed the muscles for knowing how to slap these models into shape. For anyone else, I agree right now that they can be more headache than they are worth.
I get a lot of velocity out of Opus 4.5 and spend 8-20 hours a day coding with it nearly every day, but I am constantly, multiple times an hour, screaming and yelling at these things, getting frustrated and bewildered by their output, etc. It is absolutely a tradeoff, but thankfully the tradeoff for me is frustration and mental energy, instead of correctness or performance. But left alone, these models drive in circles and tear everything up along the way.
I totally believe you about these models having difficulty with realtime programming. It's a more niche field with less example training material. Out or pure curiosity I do wish I was able to see exactly where the failure modes arise. I wonder how things will be at the end of 2026, because 2025 was a game changer for many domains.
No comments yet
Contribute on Hacker News ↗