← Back to context

Comment by underdeserver

7 hours ago

People are saying Codex 5.2 fullsolved crypto challenges in 39C3 CTF last weekend.

Three months ago I would have agreed with you, but anecdotal evidence says Codex 5.2 and Opus 4.5 are finally there.

You'll get a vastly different experience the more you use these tools and learn their limitations and how you can structure things effectively to let them do their job better. But lots of people, understandably, don't take the time to actually sit down and learn it. They spend 30 seconds on some prompt not even a human would understand, and expect the tooling to automatically spend 5 hours trying its hardest at implementing it, then they look at the results and conclude "How could anyone ever be productive with this?!".

People say a lot of things, and there is a lot of context behind what they're saying that is missing, so then we end up with conversations that basically boil down to one person arguing "I don't understand how anyone cannot see the value in this" with another person thinking "I don't understand how anyone can get any sort of value out of this", both missing the other's perspective.

  • Prompt engineering is just good transfer notes and ticket writing, which is something a majority of the devs I've worked with don't enjoy or excel at

I've been using Codex and Claude Sonnet for many months now for personal (Codex) and work (Sonnet) and I agree. Three months ago these tools were highly usable, now with Codex 5.2 and Sonnet 4.5 I think we're at the point where you can confidently rely on them to analyze your repo codebase and solve, at the very least, small scoped problems and apply any required refactor back throughout the codebase.

6-12+ months ago the results I was getting with these tools were highly questionable but in the last six months the changes have been pretty astounding

  • Sonnet is dumb as a bag of bricks compared to Opus, perhaps you meant Opus? I never use sonnet for anything anymore, it’s either too verbose or just can’t handle tasks which Opus one shots.

    • I use the Copilot extension in VS Code, which links back to my enterprise GitHub account, where I have Claude Sonnet 4.5 available amongst other things. I'm not familiar with Opus. I just open the Copilot Chat window in my VS Code, configure it to use Sonnet 4.5, tell it what I need and it writes the responses and code for me. I'm not using it for large tasks. Most of my usage is "examine this codebase and tell me how to fix xyz problem" or "look at this source code file and show me the code to implement some feature, make sure to examine the entire codebase for insight into how it should be integrated with the rest of the project"

      There's other more advanced coding AI tools but this has accomplished most all of my needs so far

      1 reply →

    • These anecdotes feel so worthless. I notice almost no difference between the two and get generally high quality results from either. This is also a worthless anecdote. I'm guessing what kind of codebase you are working in matters a lot as well as the tasks you're giving it.