Comment by wild_egg

2 months ago

I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.

20 comments

wild_egg

jaggederest 2 months ago

Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.

jonstewart 2 months ago
It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.
- rtfeldman 2 months ago
  
  Anecdotally, we use Opus 4.5 constantly on Zed's code base, which is almost a million lines of Rust code and has over 150K active users, and we use it for basically every task you can think of - new features, bug fixes, refactors, prototypes, you name it. The code base is a complex native GUI with no Web tech anywhere in it.
  I'm not talking about "write this function" but rather like implementing the whole feature by writing only English to the agent, over the course of numerous back-and-forth interactions and exhausting multiple 200K-token context windows.
  For me personally, definitely at least 99% all of the Rust code I've committed at work since Opus 4.5 came out has been from an agent running that model. I'm reading lots of Rust code (that Opus generated) but I'm essentially no longer writing any of it. If dot-autocomplete (and LLM autocomplete) disappeared from IDE existence, I would not notice.
  
  14 replies →
- jaggederest 2 months ago
  
  I don't know if you've tried Chatgpt-5.2 but I find codex much better for Rust mostly due to the underlying model. You have to do planning and provide context, but 80%+ of the time it's a oneshot for small-to-medium size features in an existing codebase that's fairly complex. I honestly have to say that it's a better programmer than I am, it's just not anywhere near as good a software developer for all of the higher and lower level concerns that are the other 50% of the job.
  If you have any opensource examples of your codebase, prompt, and/or output, I would happily learn from it / give advice. I think we're all still figuring it out.
  Also this SIMD translation wasn't just a single function - it was multiple functions across a whole region of the codebase dealing with video and frame capture, so pretty substantial.
  
  1 reply →
- andai 2 months ago
  
  Is that a context issue? I wonder if LSP would help there. Though Claude Code should grep the codebase for all necessary context and LSP should in theory only save time, I think there would be a real improvement to outcomes as well.
  The bigger a project gets the more context you generally need to understand any particular part. And by default Claude Code doesn't inject context, you need to use 3rd party integrations for that.