Comment by OldGreenYodaGPT
3 days ago
Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.
Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.
For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.
We also had Claude Code create a bunch of ESLint automation, including custom ESLint rules and lint checks that catch and auto-handle a lot of stuff before it even hits review.
Then we take it further: we have a deep code review agent Claude Code runs after changes are made. And when a PR goes up, we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it.
On top of that, we’ve got like five other Claude Code GitHub workflow agents that run on a schedule. One of them reads all commits from the last month and makes sure docs are still aligned. Another checks for gaps in end-to-end coverage. Stuff like that. A ton of maintenance and quality work is just… automated. It runs ridiculously smoothly.
We even use Claude Code for ticket triage. It reads the ticket, digs into the codebase, and leaves a comment with what it thinks should be done. So when an engineer picks it up, they’re basically starting halfway through already.
There is so much low-hanging fruit here that it honestly blows my mind people aren’t all over it. 2026 is going to be a wake-up call.
(used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)
Edit: made an example repo for ya
I made a similar comment on a different thread, but I think it also fits here: I think the disconnect between engineers is due to their own context. If you work with frontend applications, specially React/React Native/HTML/Mobile, your experience with LLMs is completely different than the experience of someone working with OpenGL, io_uring, libev and other lower level stuff. Sure, Opus 4.5 can one shot Windows utilities and full stack apps, but can't implement a simple shadowing algorithm from a 2003 paper in C++, GLFW, GLAD: https://www.cse.chalmers.se/~uffe/soft_gfxhw2003.pdf
Codex/Claude Code are terrible with C++. It also can't do Rust really well, once you get to the meat of it. Not sure why that is, but they just spit out nonsense that creates more work than it helps me. It also can't one shot anything complete, even though I might feed him the entire paper that explains what the algorithm is supposed to do.
Try to do some OpenGL or Vulkan with it, without using WebGPU or three.js. Try it with real code, that all of us have to deal with every day. SDL, Vulkan RHI, NVRHI. Very frustrating.
Try it with boost, or cmake, or taskflow. It loses itself constantly, hallucinates which version it is working on and ignores you when you provide actual pointers to documentation on the repo.
I've also recently tried to get Opus 4.5 to move the Job system from Doom 3 BFG to the original codebase. Clean clone of dhewm3, pointed Opus to the BFG Job system codebase, and explained how it works. I have also fed it the Fabien Sanglard code review of the job system: https://fabiensanglard.net/doom3_bfg/threading.php
We are not sleeping on it, we are actually waiting for it to get actually useful. Sure, it can generate a full stack admin control panel in JS for my PostgreSQL tables, but is that really "not normal"? That's basic.
We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside of grunt work like minor refactors across many files. It doesn't seem to understand proxying and how it works on both a protocol level and business logic level.
With some entirely novel work we're doing, it's actually a hindrance as it consistently tells us the approach isn't valid/won't work (it will) and then enters "absolutely right" loops when corrected.
I still believe those who rave about it are not writing anything I would consider "engineering". Or perhaps it's a skill issue and I'm using it wrong, but I haven't yet met someone I respect who tells me it's the future in the way those running AI-based companies tell me.
> We have an in-house, Rust-based proxy server. Claude is unable to contribute to it meaningfully outside
I have a great time using Claude Code in Rust projects, so I know it's not about the language exactly.
My working model is is that since LLM are basically inference/correlation based, the more you deviate from the mainstream corpus of training data, the more confused LLM gets. Because LLM doesn't "understand" anything. But if it was trained on a lot of things kind of like the problem, it can match the patterns just fine, and it can generalize over a lot layers, including programming languages.
Also I've noticed that it can get confused about stupid stuff. E.g. I had two different things named kind of the same in two parts of the codebase, and it would constantly stumble on conflating them. Changing the name in the codebase immediately improved it.
So yeah, we've got another potentially powerful tool that requires understanding how it works under the hood to be useful. Kind of like git.
2 replies →
This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc.
For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.
10 replies →
Are you still talking about Opus 4.5 I’ve been working on a Rust, kotlin and c++ and it’s been doing well. Incredible at C++, like the number of mistakes it doesn’t make
> I still believe those who rave about it are not writing anything I would consider "engineering".
Correct. In fact, this is the entire reason for the disconnect, where it seems like half the people here think LLMs are the best thing ever and the other half are confused about where the value is in these slop generators.
The key difference is (despite everyone calling themselves an SWE nowadays) there's a difference between a "programmer" and an "engineer". Looking at OP, exactly zero of his screenshotted apps are what I would consider "engineering". Literally everything in there has been done over and over to the death. Engineering is.. novel, for lack of a better word.
See also: https://www.seangoedecke.com/pure-and-impure-engineering/
11 replies →
I've had Opus 4.5 hand rolling CUDA kernels and writing a custom event loop on io_uring lately and both were done really well. Need to set up the right feedback loops so it can test its work thoroughly but then it flies.
Yeah I've handed it a naive scalar implementation and said "Make this use SIMD for Mac Silicon / NEON" and it just spits out a working implementation that's 3-6x faster and passes the tests, which are binary exact specifications.
19 replies →
> It also can't do rust really well
I have not had this experience at all. It often doesn't get it right on the first pass, yes, but the advantage with Rust vibecoding is that if you give it a rule to "Always run cargo check before you think you're finished" then it will go back and fix whatever it missed on the first pass. What I find particularly valuable is that the compiler forces it to handle all cases like match arms or errors. I find that it often misses edge cases when writing typescript, and I believe that the relative leniency of the typescript compiler is why.
In a similar vein, it is quite good at writing macros (or at least, quite good given how difficult this otherwise is). You often have to cajole it into not hardcoding features into the macro, but since macros resolve at compile time they're quite well-suited for an LLM workflow as most potential bugs will be apparent before the user needs to test. I also think that the biggest hurdle of writing macros to humans is the cryptic compiler errors, but I can imagine that since LLMs have a lot of information about compilers and syntax parsing in their training corpus, they have an easier time with this than the median programmer. I'm sure an actual compiler engineer would be far better than the LLM, but I am not that guy (nor can I afford one) so I'm quite happy to use LLMs here.
For context, I am purely a webdev. I can't speak for how well LLMs fare at anything other than writing SQL, hooking up to REST APIs, React frontend, and macros. With the exception of macros, these are all problems that have been solved a million times thus are more boilerplate than novelty, so I think it is entirely plausible that they're very poor for different domains of programming despite my experiences with them.
i've also been using opus 4.5 with lots of heavy rust development. i don't "vibe code", but lead it with a relatively firm hand- and it produces pretty good results in surprisingly complicated tasks.
for example, one of our public repos works with rust compiler artifacts and cache restoration (https://github.com/attunehq/hurry); if you look at the history you can see it do some pretty surprisingly complex (and well made, for an LLM) changes. its code isn't necessarily what i would always write, or the best way to solve the problem, but it's usually perfectly serviceable if you give it enough context and guidance.
I built an open to "game engine" entirely in Lua a many years ago, but relying on many third party libraries that I would bind to with FFI.
I thought I'd revive it, but this time with Vulkan and no third-party dependencies (except for Vulkan)
4.5 Sonet, Opus and Gemini 3.5 flash has helped me write image decoders for dds, png jpg, exr, a wayland window implementation, macOS window implementation, etc.
I find that Gemini 3.5 flash is really good at understanding 3d in general while sonnet might be lacking a little.
All these sota models seem to understand my bespoke Lua framework and the right level of abstraction. For example at the low level you have the generated Vulkan bindings, then after that you have objects around Vulkan types, then finally a high level pipeline builder and whatnot which does not mention Vulkan anywhere.
However with a larger C# codebase at work, they really struggle. My theory is that there are too many files and abstractions so that they cannot understand where to begin looking.
I'm a quite senior frontend using React and even I see Sonnet 4.5 struggle with basic things. Today it wrote my Zod validation incorrectly, mixing up versions, then just decided it wasn't working and attempted to replace the entire thing with a different library.
There’s little reason to use sonnet anymore. Haiku for summaries, opus for anything else. Sonnet isn’t a good model by today’s standards.
1 reply →
Why do we all of a sudden hold these agents to some unrealistic high bar? Engineers write bugs all the time and write incorrect validations. But we iterate. We read the stacktrace in Sentry and realise what the hell I was thinking when I wrote that, and we fix things. If you're going to benefit from these agents, you'd need to be a bit more patient and point them correctly to your codebase.
My rule of thumb is that if you can clearly describe exactly what you want to another engineer, then you can instruct the agent to do it too.
4 replies →
I'll second this. I'm making a fairly basic iOS/Swift app with an accompanying React-based site. I was able to vibe-code the React site (it isn't pretty, but it works and the code is fairly decent). But I've struggled to get the Swift code to be reliable.
Which makes sense. I'm sure there's lots of training data for React/HTML/CSS/etc. but much less with Swift, especially the newer versions.
I had surprising success vibe coding a swift iOS app a while back. Just for fun, since I have a bluetooth OBD2 dongle and an electric truck, I told Claude to make me an app that could connect to the truck using the dongle, read me the VIN, odometer, and state of charge. This was middle of 2025, so before Opus 4.5. It took Claude a few attempts and some feedback on what was failing, but it did eventually make a working app after a couple hours.
Now, was the code quality any good? Beats me, I am not a swift developer. I did it partly as an experiment to see what Claude was currently capable of and partly because I wanted to test the feasibility of setting up a simple passive data logger for my truck.
I'm tempted to take another swing with Opus 4.5 for the science.
I hate "vibe code" as a verb. May I suggest "prompt" instead? "I was able to prompt the React site…."
1 reply →
Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.
Yes, December 2025 and January 2026.
I've had pretty good luck with LLM agents coding C. In this case a C compiler that supports a subset of C and targets a customizable microcoded state machine/processor. Then I had Gemini code up a simulator/debugger for the target machine in C++ and it did it in short order and quite successfully - lets you single step through the microcode and examine inputs (and set inputs), outputs & current state - did that in an afternoon and the resulting C++ code looks pretty decent.
That's remarkably similar to something I've just started on - I want to create a self-compiling C compiler targeting (and to run on) an 8-bit micro via a custom VM. This a basically a retro-computing hobby project.
I've worked with Gemini Fast on the web to help design the VM ISA, then next steps will be to have some AI (maybe Gemini CLI - currently free) write an assembler, disassembler and interpreter for the ISA, and then the recursive descent compiler (written in C) too.
I already had Gemini 3.0 Fast write me a precedence climbing expression parser as a more efficient drop-in replacement for a recursive descent one, although I had it do that in C++ as a proof-of-concept since I don't know yet what C libraries I want to build and use (arena allocator, etc). This involved a lot of copy-paste between Gemini output and an online C++ dev environment (OnlineGDB), but that was not too bad, although Gemini CLI would have avoided that. Too bad that Gemini web only has "code interpreter" support for Python, not C and/or C++.
Using Gemini to help define the ISA was an interesting process. It had useful input in a "pair-design" process, working on various parts of the ISA, but then failed to bring all the ideas together into a single ISA document, repeatedly missing parts of what had been previously discussed until I gave up and did that manually. The default persona of Gemini seems not very well suited to this type of work flow where you want to direct what to do next, since it seems they've RL'd the heck out of it to want to suggest next step and ask questions rather than do what is asked and wait for further instruction. I eventually had to keep asking it to "please answer then stop", and interestingly quality of the "conversation" seemed to fall apart after that (perhaps because Gemini was now predicting/generating a more adversarial conversation than a collaborative one?).
I'm wondering/hoping that Gemini CLI might be better at working on documentation than Gemini web, since then the doc can be an actual file it is editing, and it can use it's edit tool for that, as opposed to hoping that Gemini web can assemble chunks of context (various parts of the ISA discussion) into a single document.
3 replies →
I've found it to be pretty hit-or-miss with C++ in general, but it's really, REALLY bad at 3D graphics code. I've tried to use it to port an OpenGL project to SDL3_GPU, and it really struggled. It would confidently insist that the code it wrote worked, when all you had to do was run it and look at the output to see a blank screen.
I hope I’m not committing a faux pas by saying this—and please feel free to tell me that I’m wrong—but I imagine a human who has been blind since birth would also struggle to build 3D graphics code.
The Claude models are technically multi-modal, but IME the vision side of the equation is really lacking. As a result, Claude is quite good at reasoning about logic, and it can build e.g. simpler web pages where the underlying html structure is enough to work with, but it’s much worse at tasks that inherently require seeing.
2 replies →
> It also can't do Rust really well, once you get to the meat of it. Not sure why that is
Because types are proofs and require global correctness, you can't just iterate, fix things locally, and wait until it breaks somewhere else that you also have to fix locally.
I have not tried C++, but Codex did a good job with low-level C code, shaders as well as porting 32 bit to 64 bit assembly drawing routines. I have also tried it with retro-computing programming with relative success.
> Mobile
From what I've seen, CC has troubles with the latest Swift too, partially because of it being latest and partially because it's so convoluted nowadays.
But it's übercharged™ for C#
I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.
And I get it. Coding with Claude Code really was prompting something, getting errors, and asking it to fix it. Which was still useful but I could see why a skilled coder adding a feature to a complex codebase would just give up
Opus 4.5 really is at a new tier however. It just...works. The errors are far fewer and often very minor - "careless" errors, not fundamental issues (like forgetting to add "use client" to a nextjs client component.
This was me. I was a huge AI coding detractor on here for a while (you can check my comment history). But, in order to stay informed and not just be that grouchy curmudgeon all the time, I kept up with the models and regularly tried them out. Opus 4.5 is so much better than anything I've tried before, I'm ready to change my mind about AI assistance.
I even gave -True Vibe Coding- a whirl. Yesterday, from a blank directory and text file list of requirements, I had Opus 4.5 build an Android TV video player that could read a directory over NFS, show a grid view of movie poster thumbnails, and play the selected video file on the TV. The result wasn't exactly full-featured Kodi, but it works in the emulator and actual device, it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything. It was pretty astounding.
Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.
I have a few Go projects now and I speak Go as well as you speak Kotlin. I predict that we'll see some languages really pull ahead of others in the next few years based on their advantages for AI-powered development.
For instance, I always respected types, but I'm too lazy to go spend hours working on types when I can just do ruby-style duck typing and get a long ways before the inevitable problems rear their head. Now, I can use a strongly typed language and get the advantages for "free".
3 replies →
Oh, wow, that's impressive, thanks for sharing!
Going to one-up you though -- here's a literal one-liner that gets me a polished media center with beautiful interface and powerful skinning engine. It supports Android, BSD, Linux, macOS, iOS, tvOS and Windows.
`git clone https://github.com/xbmc/xbmc.git`
4 replies →
How do you know “it has no memory leaks, crashes, ANRs, no performance problems, no network latency bugs or anything” if you built it just yesterday? Isn’t it a bit too early for claims like this? I get it’s easy to bring ideas to life but aren’t we overly optimistic?
2 replies →
I decided to vibe code something myself last week at work. I've been wanting to create a poc that involves a coding agent create custom bokeh plots that a user can interact with and ask follow up questions. All this had to be served using a holoview panel library
At work I only have access to calude using the GitHub copilot integration so this could be the cause of my problems. Claude was able to get slthe first iteration up pretty quick. At that stage the app could create a plot and you could interact with it and ask follow up questions.
Then I asked it to extend the app so that it could generate multiple plots and the user could interact with all of them one at a time. It made a bunch of changes but the feature was never implemented. I asked it to do again but got the same outcome. I completely accept the fact that it could just be all because I am using vscode copilot or my promoting skills are not good but the LLM got 70% of the way there and then completely failed
3 replies →
I recently replaced my monitor with one that could be vertically oriented, because I'm just using Claude Code in the terminal and not looking at file trees at all
but I do want a better way to glance and keep up with what its doing in longer conversations, for my own mental context window
2 replies →
Thanks for posting this. It's a nice reminder that despite all the noise from hype-mongers and skeptics in the past few years, most of us here are just trying to figure this all out with an open mind and are ready to change our opinions when the facts change. And a lot of people in the industry that I respect on HN or elsewhere have changed their minds about this stuff in the last year, having previously been quite justifiably skeptical. We're not in 2023 anymore.
If you were someone saying at the start of 2025 "this is a flash in the pan and a bunch of hype, it's not going to fundamentally change how we write code", that was still a reasonable belief to hold back then. At the start of 2026 that position is basically untenable: it's just burying your head in the sand and wishing for AI to go away. If you're someone who still holds it you really really need to download Claude Code and set it to Opus and start trying it with an open mind: I don't know what else to tell you. So now the question has shifted from whether this is going to transform our profession (it is), to how exactly it's going to play out. I personally don't think we will be replacing human engineers anytime soon ("coders", maybe!), but I'm prepared to change my mind on that too if the facts change. We'll see.
I was a fellow mind-changer, although it was back around the first half of last year when Claude Code was good enough to do things for me in a mature codebase under supervision. It clearly still had a long way to go but it was at that tipping point from "not really useful" to "useful". But Opus 4.5 is something different - I don't feel I have to keep pulling it back on track in quite the way I used to with Sonnet 3.7, 4, even Sonnet 4.5.
For the record, I still think we're in a bubble. AI companies are overvalued. But that's a separate question from whether this is going to change the software development profession.
6 replies →
> Oh, and I did this all without ever opening a single source file or even looking at the proposed code changes while Opus was doing its thing. I don't even know Kotlin and still don't know it.
... says it all.
1 reply →
[dead]
> "asking it to fix it."
This is what people are still doing wrong. Tools in a loop people, tools in a loop.
The agent has to have the tools to detect whatever it just created is producing errors during linting/testing/running. When it can do that, I can loop again, fix the error and again - use the tools to see whether it worked.
I _still_ encounter people who think "AI programming" is pasting stuff into ChatGPT on the browser and they complain it hallucinates functions and produces invalid code.
Well, d'oh.
Last weekend I was debugging some blocking issue on a microcontroller with embassy-rs, where the whole microcontroller would lock up as soon as I started trying to connect to an MQTT server.
I was having Opus investigate it and I kept building and deploying the firmware for testing.. then I just figured I'd explain how it could do the same and pull the logs.
Off it went, for the next ~15 minutes it would flash the firmware multiple times until it figured out the issue and fixed it.
There was something so interesting about seeing a microcontroller on the desk being flashed by Claude Code, with LEDs blinking indicating failure states. There's something about it not being just code on your laptop that felt so interesting to me.
But I agree, absolutely, red/green test or have a way of validating (linting, testing, whatever it is) and explain the end-to-end loop, then the agent is able to work much faster without being blocked by you multiple times along the way.
This is kind of why I'm not really scared of losing my job.
While Claude is amazing at writing code, it still requires human operators. And even experienced human operators are bad at operating this machinery.
Tell your average joe - the one who thinks they can create software without engineers - what "tools-in-a-loop" means, and they'll make the same face they made when you tried explaining iterators to them, before LLMs.
Explain to them how typing system, E2E or integration test helps the agent, and suddenly, they now have to learn all the things they would be required to learn to be able to write on their own.
Jules is slow incompetent shit and that uses tools in a loop, so no...
I have been out of the loop for a couple of months (vacation). I tried Claude Opus 4.5 at the end of November 2025 with the corporate Github Copilot subscription in Agent mode and it was awful: basically ignoring code and hallucinating.
My team is using it with Claude Code and say it works brilliantly, so I'll be giving it another go.
How much of the value comes from Opus 4.5, how much comes from Claude Code, and how much comes from the combination?
As someone coming from GitHub copilot in vscode and recently trying Claude Code plugin for vscode I don't get the fuss about Claude.
Copilot has by far the best and most intuitive agent UI. Just make sure you're in agent mode and choose Sonnet or Opus models.
I've just cancelled my Claude sub and gone back and will upgrade to the GH Pro+ to get more sonnet/opus.
3 replies →
I suspect that's the other thing at play here; many people have only tried Copilot because it's cheap with all the other Microsoft subscriptions many companies have. Copilot frankly is garbage compared to Cursor/Claude, even with the same exact models.
my issue hasn't been for a long time now that the code they write works or doesn't work. My issues all stem from that it works, but does the wrong thing
> My issues all stem from that it works, but does the wrong thing
It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.
I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.
2 replies →
If it does the wrong thing you tell it what the right thing is and have it try again.
With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.
30 replies →
I think it's worth understanding why. Because that's not everyone's experience and there's a chance you could make a change such that you find it extremely useful.
There's a lesser chance that you're working on a code base that Claude Code just isn't capable of helping with.
Correct it then, and next time craft a more explicit plan.
4 replies →
This was me. I have done a full 180 over the last 12 months or so, from "they're an interesting idea, and technically impressive, but not practically useful" to "holy shit I can have entire days/weeks where I don't write a single line of code".
> I really think a lof of people tried AI coding earlier, got frustrated at the errors and gave up. That's where the rejection of all these doomer predictions comes from.
It's not just the deficiencies of earlier versions, but the mismatch between the praise from AI enthusiasts and the reality.
I mean maybe it is really different now and I should definitely try uploading all of my employer's IP on Claude's cloud and see how well it works. But so many people were as hyped by GPT-4 as they are now, despite GPT-4 actually being underwhelming.
Too much hype for disappointing results leads to skepticism later on, even when the product has improved.
I feel similar, I'm not against the idea that maybe LLMs have gotten so much better... but I've been told this probably 10 times in the last few years working with AI daily.
The funny part about rapidly changing industries is that, despite the fomo, there's honestly not any reward to keeping up unless you want to be a consultant. Otherwise, wait and see what sticks. If this summer people are still citing the Opus 4.5 was a game changing moment and have solid, repeatable workflows, then I'll happily change up my workflow.
Someone could walk into the LLM space today and wouldn't be significantly at a loss for not having paid attention to anything that had happened in the last 4 years other than learning what has stuck since then.
5 replies →
You enter some text and a computer spits out complex answers generated on the spot
Right or wrong - doesn’t matter. You typed in a line of text and now your computer is making 3000 word stories, images, even videos based on it
How are you NOT astounded by that? We used to have NONE of this even 4 years ago!
5 replies →
> Opus 4.5 really is at a new tier however. It just...works.
Literally tried it yesterday. I didn't see a single difference with whatever model Claude Code was using two months ago. Same crippled context window. Same "I'll read 10 irrelevant lines from a file", same random changes etc.
The context window isn't "crippled".
Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan.
When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point.
Type /clear (no more context window), tell Claude to read the two documents.
Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.
13 replies →
200k+ tokens is a pretty big context window if you are feeding it the right context. Editors like Cursor are really good at indexing and curating context for you; perhaps it'd be worth trying something that does that better than Claude CLI does?
20 replies →
That's because Opus has been out for almost 5 months now lol. Its the same model, so I think people have been vibe coding with a heavy dose of wine this holiday and are now convinced its the future.
3 replies →
I'm not familiar with any form of intelligence that does not suffer from a bloated context. If you want to try and improve your workflow, a good place to start is using sub-agents so individual task implementations do not fill up your top level agents context. I used to regularly have to compact and clear, but since using sub-agents for most direct tasks, I hardly do anymore.
1 reply →
I use Sonnet and Opus all the time and the differences are almost negligible
Opus 4.5 is fucking up just like Sonnet really. I don't know how your use is that much different than mine.
[flagged]
Actually, I've been saying that even models from 2+ years ago were extremely good, but you needed to "hold them right" to get good results, else you might cut yourself on the sharp edges of the "jagged frontier" (https://www.hbs.edu/faculty/Pages/item.aspx?num=64700) Unfortunately, this often necessitated you to adapt yourself to the tool, which is a big change -- unfeasible for most people and companies.
I would say the underlying principle was ensuring a tight, highly relevant context (e.g. choose the "right" task size and load only the relevant files or even code snippets, not the whole codebase; more manual work upfront, but almost guaranteed one-shot results.)
With newer models the sharper edges have largely disappeared, so you can hold them pretty much any which way and still get very good results. I'm not sure how much of this is from the improvements in the model itself vs the additional context it gets from the agentic scaffolding.
I still maintain that we need to adapt ourselves to this new paradigm to fully leverage AI-assisted coding, and the future of coding will be pretty strange compared to what we're used to. As an example, see Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
2 replies →
Each failed prediction should lower our confidence in the next "it's finally useful!" claim. But this inductive reasoning breaks down at genuine inflection points.
I agree with your framing that measuring should NOT be separated from political issues, but each can be made clear separately (framing it as "training the tools of the oppressor" seems to conflate measuring tool usefulness with politics).
8 replies →
It's a little weird how defensive people are about these tools. Did everyone really think being able to import a few npm packages, string together a few APIs, and run npx create-react-app was something a large number of people could do forever?
The vast majority of coders in employment barely write anything more complex than basic CRUD apps. These jobs were always going to be automated or abstracted away sooner or later.
Every profession changes. Saying that these new tools are useless or won't impact you/xyz devs is just ignoring a repeated historical pattern
10 replies →
You're free to not open these threads, you know!
Democratizing coding so regular people can get the most out of computers is the opposite of oppression. You are mistaking your interests for societies interests.
It's the same with artists who are now pissed that regular people can manifest their artistic ideas without needing to go through an artist or spend years studying the craft. The artists are calling the AI companies oppressors because they are breaking the artist's stranglehold on the market.
It's incredibly ironic how socializing what was a privatized ability has otherwise "socialist" people completely losing their shit. Just the mask of pure virtue slipping...
13 replies →
> If I am unable to convince you to stop meticulously training the tools of the oppressor (for a fee!) then I just ask you do so quietly.
I'm kind of fascinated by how AI has become such a culture war topic with hyperbole like "tools of the oppressor"
It's equally fascinating how little these comments understand about how LLMs work. Using an LLM for inference (what you do when you use Claude Code) does not train the LLM. It does not learn from your code and integrate it into the model while you use it for inference. I know that breaks the "training the tools of the oppressor" narrative which is probably why it's always ignored. If not ignored, the next step is to decry that the LLM companies are lying and are stealing everyone's code despite saying they don't.
26 replies →
Frankly, in this comment thread you appear to be the oppressor.
5 replies →
I know someone who is using a vibe coded or at least heavily assisted text editor, praising it daily, while also saying llms will never be productive. There is a lot of dissonance right now.
I teach at a university, and spend plenty of time programming for research and for fun. Like many others, I spent some time on the holidays trying to push the current generation of Cursor, Claude Code, and Codex as far as I could. (They're all very good.)
I had an idea for something that I wanted, and in five scattered hours, I got it good enough to use. I'm thinking about it in a few different ways:
1. I estimate I could have done it without AI with 2 weeks full-time effort. (Full-time defined as >> 40 hours / week.)
2. I have too many other things to do that are purportedly more important that programming. I really can't dedicate to two weeks full-time to a "nice to have" project. So, without AI, I wouldn't have done it at all.
3. I could hire someone to do it for me. At the university, those are students. From experience with lots of advising, a top-tier undergraduate student could have achieved the same thing, had they worked full tilt for a semester (before LLMs). This of course assumes that I'm meeting them every week.
This is where the LLM coding shines in my opinion, there's a list of things they are doing very well:
- single scripts. Anything which can be reduced to a single script.
- starting greenfield projects from scratch
- code maintenance (package upgrades, old code...)
- tasks which have a very clear and single definition. This isn't linked to complexity, some tasks can be both very complex but with a single definition.
If your work falls into this list they will do some amazing work (and yours clearly fits that), if it doesn't though, prepare yourself because it will be painful.
I'm trying to determine what programming tasks are not in this list. :) I think it is trying to exclude adding new features and fixing bugs in existing code. I've done enough of that with LLMs, though not in large codebases.
I should say I'm hardly ever vibe-coding, unlike the original article. If I think I want code that will last, I'll steer the models in ways that lean on years of non-LLM experience. E.g., I'll reject results that might work if they violate my taste in code.
It also helps that I can read code very fast. I estimate I can read code 100x faster than most students. I'm not sure there is any way to teach that other than the old-fashioned way, which involves reading (and writing) a lot of code.
1 reply →
How do you compare Claude Code to Cursor? I'm a Cursor user quietly watching the CC parade with curiosity. Personally, I haven't been able to give up the IDE experience.
Im so sold on the cli tools that I think IDEs are basically dead to me. I only have an IDE open so I can read the code, but most often I'm just changing configs (like switching a bool, or bumping up a limit or something like that).
Seriously, I have 3+ claude code windows open at a time. Most days I don't even look at the IDE. It's still there running in the background, but I don't need to touch it.
I use CC for so much more than just writing code that I cannot imagine being constrained within an IDE. Why would I want to launch an IDE to have CC update the *arr stack on my NAS to the latest versions for example? Last week I pointed CC at some media files that weren't playing correctly on my Apple TV. It detected what the problem formats were and updated my *arr download rules to prefer other releases and then configured tdarr to re-encode problem files in my existing library.
When I'm using Claude Code, I usually have a text editor open as well. The CC plugin works well enough to achieve most of what Cursor was doing for me in showing real-time diffs, but in my experience, the output is better and faster. YMMV
I was here a few weeks ago, but I'm now on the CC train. The challenge is that the terminal is quite counterintuitive. But if you put on the Linux terminal lens from a few years ago, and you start using it. It starts to make sense. The form factor of the terminal isn't intuitive for programming, but it's the ultimate.
FYI, I still use cursor for small edits and reviews.
I don't think I can scientifically compare the agents. As it is, you can use Opus / Codex in Cursor. The speed of Cursor composer-1 is phenomenal -- you can use it interactively for many tasks. There are also tasks that are not easier to describe in English, but you can tab through them.
Just FYI, these days cc has 'ide integration' too, it's not just a cli. Grab the vscode extension.
What did you build? I think people talk passed eachother when people don't share what exactly they were trying to do and achieving success/failure.
Referring to this: https://github.com/arjunguha/slopcoder
I then proceeded to use it to hack on its own codebase, and close a bunch of issues in a repository that I maintain (https://github.com/nuprl/MultiPL-E/commits/main/).
> Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code.
Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks.
But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out?
> 2026 is going to be a wake-up call
Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly?
There is literally no hurry.
You're using the equivalent of the first portable CD-player in the 80s: it was huge, clunky, had hiccups, had a huge battery attached to it. It was shiny though, for those who find new things shiny. Others are waiting for a portable CD player that is slim, that buffers, that works fine. And you're saying that people won't be able to learn how to put a CD in a slim CD player because they didn't use a clunky one first.
I think getting proficient at using coding agents effectively takes a few months of practice.
It's also a skill that compounds over time, so if you have two years of experience with them you'll be able to use them more effectively than someone with two months of experience.
In that respect, they're just normal technology. A Python programmer with two years of Python experience will be more effective than a programmer with two months of Python.
> Nobody is sleeping. I'm using LLMs daily to help me in simple coding tasks.
That is sleeping.
> But really where is the hurry? At this point not a few weeks go by without the next best thing since sliced bread to come out. Why would I bother "learning" (and there's really nothing to learn here) some tool/workflow that is already outdated by the time it comes out?
You're jumping to conclusions that haven't been justified by any of the development in this space. The learning compounds.
> Do you honestly think a developer not using AI won't be able to adapt to a LLM workflow in, say, 2028 or 2029? It has to be 2026 or... What exactly?
They will, but they'll be competing against people with 2-3 more years of experience in understanding how to leverage these tools.
"But really where is the hurry?" It just depends on why you're programming. For many of us not learning and using up to date products leads to a disadvantage relative to our competition. I personally would very much rather go back to a world without AI, but we're forced to adapt. I didn't like when pagers/cell phones came out either, but it became clear very quickly not having one put me at a disadvantage at work.
The crazy part is, once you have it setup and adapted your workflow, you start to notice all sorts of other "small" things:
claude can call ssh and do system admin tasks. It works amazingly well. I have 3 VM's, which depends on each other (proxmox with openwrt, adguard, unbound), and claude can prove to me that my dns chains works perfectly, my firewalls are perfect etc as claude can ssh into each. Setting up services, diagnosing issues, auditing configs... you name it. Just awesome.
claude can call other sh scripts on the machine, so over time, you can create a bunch of scripts that lets claude one shot certain tasks that would normally eat tokens. It works great. One script per intention - don't have a script do more than one thing.
claude can call the compiler, run the debug executable and read the debug logs.. in real time. So claude can read my android apps debug stream via adb.. or my C# debug console because claude calls the compiler, not me. Just ask it to do it and it will diagnose stuff really quickly.
It can also analyze your db tables (give it readonly sql access), look at the application code and queries, and diagnose performance issues.
The opportunities are endless here. People need to wake up to this.
> claude can call ssh and do system admin tasks
Claude set up a Raspberry Pi with a display and conference audio device for me to use as an Alexa replacement tied to Home Assistant.
I gave it an ssh key and gave it root.
Then I told it what I wanted, and it did. It asked for me to confirm certain things, like what I could see on screen, whether I could hear the TTS etc. (it was a bit of a surprise when it was suddenly talking to me while I was minding my own business).
It configured everything, while keeping a meticulous log that I can point it at if I want to set up another device, and eventually turn into a runbook if I need to.
I have a /fix-ci-build slash command that instructs Claude how to use `gh` to get the latest build from that specific project's Github Actions and get the logs for the build
In addition there are instructions on how and where to push the possible fixes and how to check the results.
I've yet to encounter a build failure it couldn't fix automatically.
Why do all these AI generated readmes have a directory structure sections it's so redundant because you know I could just run tree
It makes me so exhausted trying to read them... my brain can tell immediately when there's so much redundant information that it just starts shutting itself off.
comments? also reading into an agent so the agent doesnt have to tool-call/bash out
I think we're entering a world where programmers as such won't really exist (except perhaps in certain niches). Being able to program (and read code, in particular) will probably remain useful, though diminished in value. What will matter more is your ability to actually create things, using whatever tools are necessary and available, and have them actually be useful. Which, in a way, is the same as it ever was. There's just less indirection involved now.
We've been living in that world since the invention of the compiler ("automatic programming"). Few people write machine code any more. If you think of LLMs as a new variety of compiler, a lot of their shortcomings are easier to describe.
My compiler runs on my computer and produces the same machine code given the same input. Neither of these are true with AI.
1 reply →
last thing I want is non-deterministic compiler, do not vibe this analogy at all…
Finally we've invented a compiler that we can yell at when it gives bullshit errors. I really missed that with gcc.
Isn't there more indirection as long as LLMs use "human" programming languages?
If you think of the training data, e.g. SO, github etc, then you have a human asking or describing a problem, then the code as the solution. So I suspect current-gen LLMs are still following this model, which means for the forseeable future a human like language prompt will still be the best.
Until such time, of course, when LLMs are eating their own dogfood, in which case they - as has already happened - create their own language, evolve dramatically, and cue skynet.
More indirection in the sense that there's a layer between you and the code, sure. Less in that the code doesn't really matter as such and you're not having to think hard about the minutiae of programming in order to make something you want. It's very possible that "AI-oriented" programming languages will become the standard eventually (at least for new projects).
1 reply →
It’s not clear how affordances of programming languages really differ between humans and LLMs.
You intrigue me.
> have it learn your conventions, pull in best practices
What do you mean by "have it learn your conventions"? Is there a way to somehow automatically extract your conventions and store it within CLAUDE.md?
> For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box.
Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?
> What do you mean by "have it learn your conventions"?
I'll give you an example: I use ruff to format my python code, which has an opinionated way of formatting certain things. After an initial formatting, Opus 4.5, without prompting, will write code in this same style so that the ruff formatter almost never has anything to do on new commits. Sonnet 4.5 is actually pretty good at this too.
Isn't this a meaningless example? Formatters already exist. Generating code that doesn't need to be formatted is exactly the same as generating code and then formatting it.
I care about the norms in my codebase that can't be automatically enforced by machine. How is state managed? How are end-to-end tests written to minimize change detectors? When is it appropriate to log something?
5 replies →
Starting to use Opus 4.5 I'm reduces instrutions in claude.md and just ask claude to look in the codebase to understand the patterns already in use. Going from prompts/docs to instead having code being the "truth". Show don't tell. I've found this patterns has made a huge leap with Opus 4.5.
The Ash framework takes the approach you describe.
From the docs (https://hexdocs.pm/ash/what-is-ash.html):
"Model your application's behavior first, as data, and derive everything else automatically. Ash resources center around actions that represent domain logic."
I feel like I've been doing this since Sonnet 3.5 or Sonnet 4. I'll clone projects/modules/whatever into the working directory and tell claude to check it out. Voila, now it knows your standards and conventions.
When I ask Claude to do something, it independently, without me even asking or instructing it to, searches the codebase to understand what the convention is.
I’ve even found it searching node_modules to find the API of non-public libraries.
This sounds like it would take a huge amount of tokens. I've never used agents so could you disclose how much you pay for it?
5 replies →
Just ask it to.
/init in Claude Code already automatically extracts a bunch, but for something more comprehensive, just tell it which additional types of things you want it to look for and document.
> Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?
I don't know about the person above, but I tell Claude to write all my skills and agents for me. With some caveats, you can do this iteratively in a single session ("update the X agent, then re-run it. Repeat until it reliably does Y")
"Claude, clone this repo https://github.com/repo, review the coding conventions, check out any markdown or readme files. This is an example of coding conventions we want to use on this project"
> Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time.
I agree with this, but I haven't needed to use any advanced features to get good results. I think the simple approach gets you most of the benefits. Broadly, I just have markdown files in the repo written for a human dev audience that the agent can also use.
Basically:
- README.md with a quick start section for devs, descriptions of all build targets and tests, etc. Normal stuff.
- AGENTS.md (only file that's not written for people specifically) that just describes the overall directory structure and has a short step of instructions for the agent: (1) Always read the readme before you start. (2) Always read the relevant design docs before you start. (3) Always run the linter, a build, and tests whenever you make code changes.
- docs/*.md that contain design docs, architecture docs, and user stories, just text. It's important to have these resources anyway, agent or no.
As with human devs, the better the docs/requirements the better the results.
I'd really encourage you to try using agents for tasks that are repeatable and/or wordy but where most of the words are not relevant for ongoing understanding.
It's a tiny step further, and sub-agents provide a massive benefit the moment you're ready to trust the model even a little bit (relax permissions to not have it prompt you for every little thing; review before committing rather than on every file edit) because they limit what goes into the top level context, and can let the model work unassisted for far longer. I now regularly have it run for hours at a time without stopping.
Running and acting on output from the linter is absolutely an example of that which matters even for much shorter runs.
There's no reason to have all the lint output "polluting" the top level context, nor to have the steps the agent needs to take to fix linter issues that can't be auto-fixed by the linter itself. The top level agent should only need to care about whether the linter run passed or failed (and should know it needs to re-run and possibly investigate if it fails).
Just type /agents, select "Create new agent" and describe a task you often do, and then forget about it (or ask Claude to make changes to it for you)
That's a great point. There are a lot of things you can do to optimise things, and your suggestion is one of the lower hanging fruits.
I was trying to get across the point that today you can get a lot of benefit from minimal setup, even one that's vendor-agnostic. (The steps I outlined work for Codex out of the box, too.)
You're right to point out that the more you refine things, the more you'll get out of the tools. It used to be that you had to do a lot of refinements to start getting good results at all. Now, you can get a lot out of even a basic setup like I outlined, which is great for people who are new users -- or people who tried it before and weren't that impressed but are now giving it another try.
Oh! An ad!
The most effective kind of marketing is viral word of mouth from users who love your product. And Claude Code is benefiting from that dynamic.
lol does sound like and ad, but is true. Also forgot about hooks use hooks too! I just use voice to text then had claude reword it. Still my real world ideas
Exactly what an ad would say.
All of these things work very well IMO in a professional context.
Especially if you're in a place where a lot of time was spent previously revising PRs for best practices, etc, even for human-submitted code, then having the LLM do that for you that saves a bunch of time. Most humans are bad at following those super-well.
There's a lot of stuff where I'm pretty sure I'm up to at least 2x speed now. And for things like making CLI tools or bash scripts, 10x-20x. But in terms of "the overall output of my day job in total", probably more like 1.5x.
But I think we will need a couple major leaps in tooling - probably deterministic tooling, not LLM tooling - before anyone could responsibly ship code nobody has ever read in situations with millions of dollars on the line (which is different from vibe-coding something that ends up making millions - that's a low-risk-high-reward situation, where big bets on doing things fast make sense. if you're already making millions, dramatic changes like that can become high-risk-low-reward very quickly. In those companies, "I know that only touching these files is 99.99% likely to be completely safe for security-critical functionality" and similar "obvious" intuition makes up for the lack of ability to exhaustively test software in a practical way (even with fuzzers and things), and "i didn't even look at the code" is conceding responsibility to a dangerous degree there.)
I still struggle with these things being _too_ good at generating code. They have a tendency to add abstractions, classes, wrappers, factories, builders to things that didn't really need all that. I find they spit out 6 files worth of code for something that really only needed 2-3 and I'm spending time going back through simplifying.
There are times those extra layers are worth it but it seems LLMs have a bias to add them prematurely and overcomplicate things. You then end up with extra complexity you didn't need.
> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)
Reword? But why not just voice to text alone...
Oh but we all read the partially synthetic ad by this point. Psyche.
I'm curious: With that much Claude Code usage, does that put your monthly Anthropic bill above $1000/mo?
Mind sharing the bill for all that?
My company pays for the team Claude code plan which is like $200 a month for each dev. The workflows cost like 10 - 50 cents a PR
It will have to quintuple or more to make business sense for Anthropic. Sure, still cheaper than a full time developer, but don't expect it to stay at $200 for a long time. And then, when you explain to your boss how amazing it is, and can do all this work so easily and quickly, it's when your boss start asking the real question: what am I paying you for?
6 replies →
It's $150, not a huge difference but worth noting that it's not the same ast the 20x Max plan.
Cheaper than hiring another developer, probably. My experience: for a few dollars I was able to extensively refactor a Python codebase in half a day. This otherwise would have taken multiple days of very tedious work.
And that's what the C-suite wants to know. Prepare yourself to be replaced in the not so distant future. Hope you have a good "nest" to support yourself when you're inevitably fired.
24 replies →
i've never hit a limit with my $200 a month plan
They are sleeping on it because there is absolutely no incentive to use it.
When needed it can be picked up in a day. Otherwise they are not paid based in tickets solved etc. If the incentives were properly aligned everyone would already use it
Use Claude Code... to do what? There are multiple layers of people involved in the decision process and they only come up with a few ideas every now and then. Nothing I can't handle. AI helps but it doesn't have to be an agent.
I'm not saying there aren't use cases for agents, just that it's normal that most software engineers are sleeping on it.
Came across official anthropic repo on gh actions very relevant to what you mentioned. Your idea on scheduled doc updation using llm is brilliant, I’m stealing this idea. https://github.com/anthropics/claude-code-action
Agreed and skills are a huge unlock.
codex cli even has a skill to create skills; it's super easy to get up to speed with them
https://github.com/openai/skills/blob/main/skills/.system/sk...
Thanks for the example! There's a lot (of boilerplate?) here that I don't understand. Does anyone have good references for catching up to speed what's the purpose of all of these files in the demo?
Also new haiku. Not as smart but lighting fast, I've it review code changes impact or if i need a wide but shallow change done I've it scan the files and create a change plan. Saves a lot of time waiting for claude or codex to get their bearing.
(if you know) how is that compared to coderabbit? i'm seriously looking for something better rn...
Never tried coderabbit, just because this is already good enough with Claude Code. It helped us to catch dozens of important issues we wouldn't have caught. We gave some instructions in the CLAUDE.md doc in the repository - with including a nice personalized roast of the engineer that did the review in the intro and conclusion to make it fun! :) Basically, when you do a "create PR" from your Claude Code, it will help you getting your Linear ticket (or creating one if missing), ask you some important questions (like: what tests have you done?), create the PR on Github, request the reviewers, and post a "Auto Review" message with your credentials. It's not an actual review per se but this is enough for our small team.
thanks for the reply, yea we have a claude.md file, but coderabbit doesn't seem to pick it up or ignore it... hmmm wish we could try out claude code.
1 reply →
I was expecting a showcase to showcase what you've done with it, not just another person's attempt at instructing an AI to follow instructions.
If anyone is excited about, and has experience with this kind of stuff, please DM. I have a role open for setting up these kinds of tools and workflows.
Is Claude "Code" anything special,or it's mostly the LLM and other CLIs (e.g. Copilot) also work?
I've tried most of the CLI coding tools with the Claude models and I keep coming back to Claude Code. It hits a sweet spot of simple and capable, and right now I'd say it's the best from an "it just works" perspective.
In my experience the CLI tool is part of the secret sauce. I haven't tried switching models per each CLI tool though. I use claude exclusively at work and for personal projects I use claude, codex, gemini.
It’s mostly the model, Copilot, Claude Code, OpenCode, snake oil like Oh My OpenCode, it’s not huge differences.
Claude Code seems to package a relatively smart prompt as well, as it seems to work better even with one-line prompts than alternatives that just invoke the API.
Key word: seems. It's impossible to do a proper qualitative analysis.
Why do you call Oh My OpenCode snake oil?
[dead]
> (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!)
take my downvote as hard as you can. this sort of thing is awfully off-putting.
I'm at the point where I say fuck it, let them sleep.
The tech industry just went through an insane hiring craze and is now thinning out. This will help to separate the chaff from the wheat.
I don't know why any company would want to hire "tech" people who are terrified of tech and completely obstinate when it comes to utilizing it. All the people I see downplaying it take a half-assed approach at using it then disparage it when it's not completely perfect.
I started tinkering with LLMs in 2022. First use case, speak in natural english to the llm, give it a json structure, have it decipher the natural language and fill in that json structure (vacation planning app, so you talk to it about where/how you want to vacation and it creates the structured data in the app). Sometimes I'd use it for minor coding fixes (copy and paste a block into chatgpt, fix errors or maybe just ideation). This was all personal project stuff.
At my job we got LLM access in mid/late 2023. Not crazy useful, but still was helpful. We got claude code in 2024. These days I only have an IDE open so I can make quick changes (like bumping up a config parameter, changing a config bool, etc.). I almost write ZERO code now. I usually have 3+ claude code sessions open.
On my personal projects I'm using Gemini + codex primarily (since I have a google account and chatgpt $20/month account). When I get throttled on those I go to claude and pay per token. I'll often rip through new features, projects, ideas with one agent, then I have another agent come through and clean things up, look for code smells, etc. I don't allow the agents to have full unfettered control, but I'd say 70%+ of the time I just blindly accept their changes. If there are problems I can catch them on the MR/PR.
I agree about the low hanging fruit and I'm constantly shocked at the sheer amount of FUD around LLMs. I want to generalize, like I feel like it's just the mid/jr level devs that speak poorly about it, but there's definitely senior/staff level people I see (rarely, mind you) that also don't like LLMs.
I do feel like the online sentiment is slowly starting to change though. One thing I've noticed a lot of is that when it's an anonymous post it's more likely to downplay LLMs. But if I go on linkedin and look at actual good engineers I see them praising LLMs. Someone speaking about how powerful the LLMs are - working on sophisticated projects at startups or FAANG. Someone with FUD when it comes to LLM - web dev out of Alabama.
I could go on and on but I'm just ranting/venting a little. I guess I can end this by saying that in my professional/personal life 9/10 of the top level best engineers I know are jumping on LLMs any chance they get. Only 1/10 talks about AI slop or bullshit like that.
Not entirely disagreeing with your point but I think they've mostly been forced to pivot recently for their own sakes; they will never say it though. As much as they may seem eager the most public people tend to also be better at outside communication and knowing what they should say in public to enjoy more opportunities, remain employed or for the top engineers to still seem relevant in the face of the communities they are a part of. Its less about money and more about respect there I think.
The "sudden switch" since Opus 4.5 when many were saying just a few months ago "I enjoy actual coding" but now are praising LLM's isn't a one off occurrence. I do think underneath it is somewhat motivated by fear; not for the job however but for relevance. i.e. its in being relevant to discussions, tech talks, new opportunities, etc.
OK, I am gonna be the guy and put my skin in the game here. I kind of get the hype, but the experience with e.g. Claude Code (or Github Copilot previously and others as weel) has so far been pretty unreliable.
I have Django project with 50 kLOC and it is pretty capable of understanding the architecture, style of coding, naming of variables, functions etc. Sometimes it excels on tasks like "replicate this non-trivial functionality for this other model and update the UI appropriately" and leaves me stunned. Sometimes it solves for me tedious and labourous "replace this markdown editor with something modern, allowing fullscreen edits of content" and does annoying mistake that only visual control shows and is not capable to fix it after 5 prompts. I feel as I am becoming tester more than a developer and I do not like the shift. Especially when I do not like to tell someone he did an obvious mistake and should fix it - it seems I do not care if it is human or AI, I just do not like incompetence I guess.
Yesterday I had to add some parameters to very simple Falcon project and found out it has not been updated for several months and won't build due to some pip issues with pymssql. OK, this is really marginal sub-project so I said - let's migrate it to uv and let's not get hands dirty and let the Claude do it. He did splendidly but in the Dockerfile he missed the "COPY server.py /data/" while I asked him to change the path... Build failed, I updated the path myself and moved on.
And then you listen to very smart guys like Karpathy who rave about Tab, Tab, Tab, while not understanding the language or anything about the code they write. Am I getting this wrong?
I am really far far away from letting agents touch my infrastructure via SSH, access managed databases with full access privileges etc. and dread the day one of my silly customers asks me to give their agent permission to managed services. One might say the liability should then be shifted, but at the end of the day, humans will have to deal with the damage done.
My customer who uses all the codebase I am mentioning here asked me, if there is a way to provide "some AI" with item GTINs and let it generate photos, descriptions, etc. including metadata they handcrafted and extracted for years from various sources. While it looks like nice idea and for them the possibility of decreasing the staff count, I caught the feeling they do not care about the data quality anymore or do not understand the problems the are brining upon them due to errors nobody will catch until it is too late.
TL;DR: I am using Opus 4.5, it helps a lot, I have to keep being (very) cautious. Wake up call 2026? Rather like waking up from hallucination.
Why dont I see any streams building apps as quickly as they say? Just HYpe
Didn't feel like reading all this so I shortened it! sorry!
I shortened it for anyone else that might need it
----
Software engineers are sleeping on Claude Code agents. By teaching it your conventions, you can automate your entire workflow:
Custom Skills: Generates code matching your UI library and API patterns.
Quality Ops: Automates ESLint, doc syncing, and E2E coverage audits.
Agentic Reviews: Performs deep PR checks against custom checklists.
Smart Triage: Pre-analyzes tickets to give devs a head start.
Check out the showcase repo to see these patterns in action.
you are part of the problem
Everybody says how good Claude is and I go to my code base and I can't get it to correctly update one xaml file for me. It is quicker to make changes myself than to explain exactly what I need or learn how to do "prompt engineering".
Disclaimer: I don't have access to Claude Code. My employer has only granted me Claude Teams. Supposedly, they don't use my poopy code to train their models if I use my work email Claude so I am supposed to use that. If I'm not pasting code (asking general questions) into Claude, I believe I'm allowed to use whatever.
What's even the point of this comment if you self-admittedly don't have access to the flagship tool that everyone has been using to make these big bold coding claims?
isn't Claude Teams powerful? does it not have access to Opus?
pardon my ignorance.
I use GitHub Copilot which has access to llms like Gemini 3, Sonnet/Opus 4.5 ang GPT 5.2
Because the same claims of "AI tool does everything" are made over and over again.
3 replies →