Comment by xnorswap
2 months ago
Claude is really good at specific analysis, but really terrible at open-ended problems.
"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.
It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.
There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.
That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.
But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.
That's why you treat it like a junior dev. You do the fun stuff of supervising the product, overseeing design and implementation, breaking up the work, and reviewing the outputs. It does the boring stuff of actually writing the code.
I am phenomenally productive this way, I am happier at my job, and its quality of work is extremely high as long as I occasionally have it stop and self-review it's progress against the style principles articulated in its AGENTS.md file. (As it tends to forget a lot of rules like DRY)
I think we have different opinions on what's fun and what's boring!
You've really hit the crux of the problem and why so many people have differing opinions about AI coding. I also find coding more fun with AI. The reason is that my main goal is to solve a problem, or someone else's problem, in a way that is satisfying. I don't much care about the code itself anymore. I care about the thing that it does when it's done.
Having said that I used to be deep into coding and back then I am quite sure that I would hate AI coding for me. I think for me it comes down to – when I was learning about coding and stretching my personal knowledge in the area, the coding part was the fun part because I was learning. Now that I am past that part I really just want to solve problems, and coding is the means to that end. AI is now freeing because where I would have been reluctant to start a project, I am more likely to give it a go.
I think it is similar to when I used to play games a lot. When I would play a game where you would discover new items regularly, I would go at it hard and heavy up until the point where I determined there was either no new items to be found or it was just "more of the same". When I got to that point it was like a switch would flip and I would lose interest in the game almost immediately.
71 replies →
Some people are into designing software, others like to put the design into implementation, others like cleaning up implementations yet others like making functional software faster.
There is enough work for all of us to be handsomely paid while having fun doing it :) Just find what you like, and work with others who like other stuff, and you'll get through even the worst of problems.
For me the fun comes not from the action of typing stuff with my sausage fingers and seeing characters end up on the screen, but basically everything before that and after that. So if I can make "translate what's in my head into source on disk something can run" faster, that's a win in my book, but not if the quality degrades too much, so tight control over it still not having to use my fingers to actually type.
2 replies →
He's a real straight shooter with upper management written all over him.
2 replies →
I really enjoy writing some of the code. But some is a pain. Never have fun when the HQ team asks for API changes for the 5th time this month. Or for that matter writing the 2000 lines of input and output data validation in the first place. Or refactoring that ugly dictionary passed all over the place to be a proper class/dataclass. Handling config changes. Lots of that piping job.
Some tasks I do enjoy coding. Once in the flow it can be quite relaxing.
But mostly I enjoy the problem solving part: coming up with the right algorithm, a nice architecture , the proper set of metrics to analyze etc
Yeah at this point I basically have to dictate all implementation details: do this, but do it this specific way, handle xyz edge cases by doing that, plug the thing in here using that API. Basically that expands 10 lines into 100-200 lines of code.
However if I just say “I have this goal, implement a solution”, chances are that unless it is a very common task, it will come up with a subpar/incomplete implementation.
What’s funny to me is that complexity has inverted for some tasks: it can ace a 1000 lines ML model for a general task I give it, yet will completely fail to come up with a proper solution for a 2D geometric problem that mostly has high school level maths that can be solved in 100 lines
Maybe I'm weird but I enjoy "actually writing the code."
I sometimes think of it as a sculptor analogy.
Some famous sculptors had an atelier full of students that helped them with mundane tasks, like carving out a basic shape from a block of stone.
When the basic shape was done, the master came and did the rest. You may want to have the physical exercise of doing the work yourself, but maybe someone sometimes likes to do the fine work and leave the crude one to the AI.
In my case, it really depends what. I enjoy designing systems and domain-specific languages or writing libraries that work the way I think they should work.
On the other hand, if e.g. I need a web interface to do something, the only way I can enjoy myself is by designing my own web framework, which is pretty time-consuming, and then I still need to figure out how to make collapsible sections in CSS and blerghhh. Claude can do that in a few seconds. It's a delightful moment of "oh, thank god, I don't have to do this crap anymore."
There are many coding tasks that are just tedium, including 99% of frontend development and over half of backend development. I think it's fine to throw that stuff to AI. It still leaves a lot of fun on the table.
In my case, I enjoy writing code too, but it's helpful to have an assistant I can ask to handle small tasks so I can focus on a specific part that requires attention to detail
32 replies →
You really get enjoyment writing a full CRUD HTTP API five times, one for each endpoint?
I don't :) Before I had IDE templates and Intellisense. Now I can just get any agentic AI to do it for me in 60 seconds and I can get to the actual work.
1 reply →
“I want my AI to do laundry and dishes so I can code, not for my AI to code so I can do laundry and dishes”
11 replies →
Me writing code is me spending 3/4 of my time wading through documentation and google searches. It's absolutely hell on my ADD. My ability to memorize is absolutely garbage. Throughout my career I've worked in like 10 different languages, and in any given project I'm usually working in at least 3 or 4. There's a lot of "now what is a map operation in this stupid fucking language called again?!"
Claude writing code gets the same output if not better in about 1/10 of the time.
That's where you realize that the writing code bits are just one small part of the overall picture. One that I realize I could do without.
22 replies →
> its quality of work is extremely high ...
It may seem decent until you look closer. Just like with a junior dev, you should always review the code very carefully, you can absolutely not trust it. It's not bad at trivial stuff, but fails almost always if things get more complex and unlike a junior dev, it does not tell you, when things get too complex for it.
Cool cool cool. So if you use LLMs as junior devs, let me ask you how future awesome senior devs like you will come around? From WHAT job experience? From what coding struggle?
What would you like individual contributors to do about it, exactly? Refuse to use it, even though this person said they're happier and more fulfilled at work?
I'm asking because I legitimately have not figured out an answer to this problem.
Why is that a developer's problem? If anything, they are incentivized to avoid creating future competition in the job market.
2 replies →
My last job there was effectively a gun held to the back of my head, ordering me to use this stuff. And this started about a year ago, when the tooling for agentic dev was absolutely atrocious, because we had a CTO who had the biggest most raging boner for anything that offered even a whiff of "AI".
Unfortunately the bar is being raised on us. If you can't hang with the new order you are out of a job. I promise I was one of the holdouts who resisted this the most. It's probably why I got laid off last spring.
Thankfully, as of this last summer, agentic dev started to really get good, and my opinion made a complete 180. I used the off time to knock out a personal project in a month or two's worth of time, that would have taken me a year+ the old way. I leveraged that experience to get me where I am now.
2 replies →
How do you get junior devs if your concept of the LLM is that it's "a principal engineer" that "do[es] not ask [you] any questions"?
Also, I'm pretty sure junior devs can use directing a LLM to learn from mistakes faster. Let them play. Soon enough they're going to be better than all of us anyway. The same way widespread access to strong chess computers raised the bar at chess clubs.
2 replies →
There's that long term thinking that the tech industry, and really every other publicly traded company is known for.
> That's why you treat it like a junior dev. You do the fun stuff of supervising the product, overseeing design and implementation, breaking up the work, and reviewing the outputs. It does the boring stuff of actually writing the code.
I am so tired of this analogy. Have the people who say this never worked with a junior dev before? If you treat your junior devs as brainless code monkeys who only exist to type out your brilliant senior developer designs and architectures instead of, you know, human beings capable of solving problems, 1) you're wasting your time, because a less experienced dev is still capable of solving problems independently, 2) the juniors working under you will hate it because they get no autonomy, and 3) the juniors working under you will stay junior because they have no opportunity to learn--which means you've failed at one of your most important tasks as a senior developer, which is mentorship.
I have mentored and worked with a junior dev. And the only way to get her to do anything useful and productive was to spell things out. Otherwise she got wrapped around the axle trying to figure out the complex things and was constantly asking for my help with basic design-level tasks. Doing the grunt work is how you learn the higher-level stuff.
When I was a junior, that's how it was for me. The senior gave me something that was structured and architected and asked me to handle smaller tasks that were beneath them.
Giving juniors full autonomy is a great way to end up with an unmaintainable mess that is a nightmare to work with without substancial refactoring. I know this because I have made a career out of fixing exactly this mistake.
4 replies →
In my experience, juniors were more capable at solving engineering problems than some staff engineers, but that’s just an artifact of a broken ladder system.
I don't follow.
In the same breath (same paragraph) you state two polar opposites about working with AI:
I don't see how you can claim to be "phenomenally productive" when working with a tool you have to babysit because it forgets your instructions the whole time.
If it was the "junior dev" you also mention, I suspect you would very quickly invite the "junior dev" to find a job elsewhere.
You don't have to follow. I'm still punching way above my weight. Not sure why both things can't be true at once.
Few weeks ago I'd disagree with you, but recently I've been struggling with concentration and motivation and now I kind of try to embrace coding with AI. I guide it pretty strictly, try to stick with pure functions, and always read the output thoroughly. In a couple of places requiring some carefulness I coded them in executable pseudocode (Python) and made AI translate it to the more boilerplate-y target language.
I don't know if I'm any faster than I would be if I was motivated, but I'm A LOT more productive in my current state. I still hope for the next AI winter though.
I really hope you don't actually treat junior devs this way...
I enjoy finding the problem and then telling Claude to fix it. Specifying the function and the problem. Then going to get a coffee from the breakroom to see it finished when I return. The junior dev has questions when I did that. Claude just fixes it.
I wonder if DRY is still a principle worth holding onto in the AI coding era. I mean it probably is, but this feels like enough of a shift in coding design that re-evaluating principles designed for human-only coding might be worth the effort
> rules like DRY
Principles like DRY
TBH I think its ability to structure unstructured data is what makes it a powerhouse tool and there is so much juice to squeeze there that we can make process improvements for years even if it doesnt get any better at general intelligence.
If I had a pdf printout of a table, the workflow i used to have to use to get that back into a table data structure to use for automation was hard (annoying). dedicated OCR tools with limitations on inputs, multiple models in that tool for the different ways the paper the table was on might be formatted. it took hours for a new input format
now i can take a photo of something with my phone and get a data table in like 30 seconds.
people seem so desperate to outsource their thinking to these models and operating at the limits of their capability, but i have been having a blast using it to cut through so much tedium that werent unsolved problems but required enough specialized tooling and custom config to be left alone unless you really had to
this fits into what youre saying with using it to do the grunt work i find boring i suppose, but feels a little bit more than that - like it has opened a lot of doors to spaces that had grunt work that wasnt worth doing for the end result previously but now it is
> There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.
While this is true in my experience, the opposite is not true. LLMs are very good at helping me go through a structure processing of thinking about architectural and structural design and then help build a corresponding specification.
More specifically the "idea honing" part of this proposed process works REALLY well: https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/
This: Each question should build on my previous answers, and our end goal is to have a detailed specification I can hand off to a developer. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time.
I've checked the linked page and there's nothing about even learning the domain or learning the tech platform you're going to use. It's all blind faith, just a small step above copying stuff from GitHub or StackOverflow and pushing it to prod.
You completely missed the point of my comment...
This is it. It doesn't replace the higher level knowledge part very well.
I asked Claude to fix a pet peeve of mine, spawning a second process inside an existing Wine session (pretty hard if you use umu, since it runs in a user namespace). I asked Claude to write me a python server to spawn another process to pass through a file handler "in Proton", and it proceeded a long loop of trying to find a way to launch into an existing wine session from Linux with tons of environment variables that didn't exist.
Then I specified "server to run in Wine using Windows Python" and it got more things right. Except it tried to use named pipes for IPC. Which, surprise surprise, doesn't work to talk to the Linux piece. Only after I specified "local TCP socket" it started to go right. Had I written all those technical constraints and made the design decisions in the first message it'd have been a one-hit success.
This is a key part of the AI love/hate flame war.
Very easy to write it off when it spins out on the open-ended problems, without seeing just how effective it can be once you zoom in.
Of course, zooming in that far gives back some of the promised gains.
Edit: typo
> without seeing just how effective it can be once you zoom in.
The love/hate flame war continues because the LLM companies aren't selling you on this. The hype is all about "this tech will enable non-experts to do things they couldn't do before" not "this tech will help already existing experts with their specific niche," hence the disconnect between the sales hype and reality.
If OpenAI, Anthropic, Google, etc. were all honest and tempered their own hype and misleading marketing, I doubt there would even be a flame war. The marketing hype is "this will replace employees" without the required fine print of "this tool still needs to be operated by an expert in the field and not your average non technical manager."
The amount of GUIs I've vibe-coded works against your claim.
As we speak, my macOS menubar has an iStat Menus replacement, a Wispr Flow replacement (global hotkey for speech-to-text), and a logs visualizer for the `blocky` dns filtering program -- all of which I built without reading code aside from where I was curious.
It was so vibe-coded that there was no reason to use SwiftUI nor set them up in Xcode -- just AppKit Swift files compiled into macOS apps when I nix rebuild.
The only effort it required was the energy to QA the LLM's progress and tell it where to improve, maybe click and drag a screenshot into claude code chat if I'm feeling excessive.
Where do my 20 years of software dev experience fit into this except beyond imparting my aesthetic preferences?
In fact, insisting that you write code yourself is becoming a liability in an interesting way: you're going to make trade-offs for DX that the LLM doesn't have to make, like when you use Python or Electron when the LLM can bypass those abstractions that only exist for human brains.
15 replies →
Go one level up:
Exactly, if you visualize software as a bunch separate "states" (UI state, app state, DB state) then our job is to mutate states and synchronize those mutations across the system. LLMs are good at mutating a specific state in a specific way. They are trash at designing what data shape a state should be, and they are bad at figuring out how/why to propagate mutations across a system.
The structured vs open-ended distinction here applies to code review too. When you ask an LLM to "find issues in this code", it'll happily find something to say, even if the code is fine. And when there are actual security vulnerabilities, it often gets distracted by style nitpicks and misses the real issues.
Static analysis has the opposite problem - very structured, deterministic, but limited to predefined patterns and overwhelms you in false positives.
The sweet spot seems to be to give structure to what the LLM should look for, rather than letting it roam free on an open-ended "review this" prompt.
We built Autofix Bot[1] around this idea.
[1] https://autofix.bot (disclosure: founder)
It works great in C# (where you have strong typing + strict compiler).
Try this:
Have a look at xyz.cs. Do a full audit of the file and look for any database operations in loops that can be pre-filtered.
Or:
Have a look at folder /folderX/ and add .AsNoTracking() to all read-only database queries. When you are done, run the compiler and fix the errors. Only modify files in /folderX/ and do not go deeper in the call hierarchy. Once you are done, do a full audit of each file and make sure you did not accidentally added .AsNoTracking() to tracked entities. Do no create any new files or make backups, I already created a git branch for you. Do not make any git commits.
Or:
Have a look at the /Controllers/ folder. Do a full audit of each controller file and make sure there are no hard-coded credentials, username, password or tokens.
Or: Have a look at folder /folderX/. Find any repeated hard-coded values, magic values and literals that will make good candidates to extract to Constants.cs. Make sure to add XML comments to the Constants.cs file to document what the value is for. You may create classes within Constants.cs to better group certain values, like AccountingConstants or SystemConstants etc.
These kinds of tasks works amazing in claude code an can often be one shotted. Make sure you check your git diffs - you cannot and should not blame AI for shitty code - its your name next to the commit, make sure it is correct. You can even ask claude to review the file with you afterwards. I've used this kind of approach to greatly increase our overall code quality & performance tuning - I really don't understand all the negative comments as this approach has chopped down days worth of refactorings to a couple of minutes and hours.
In places where you see your coding assistant is slow or making mistakes or it is going line by line where you know a simple regex find/replace would work instantly, ask it to help you create a shell script as a tool for itself to call, that does task xyz that it can call. I've made a couple of scripts that uses this approach that Claude can call locally to fix certain code pattern in 5 seconds that would've taken it (and me checking it) 30 mins at least and it wont eat up context or tokens.
I think slash commands are great to help Claude with this. I have many like /code:dry /code:clean-code etc that has a semi long prompt and references to longer docs to review code from a specific perspective. I think it atleast improves Claude a bit in this area. Like processes or templates for thinking in broader ways. But yes I agree it struggles a lot in this area.
Somewhat tangential but interestingly I'd hate for Claude to make any changes with the intent of sticking to "DRY" or "Clean Code".
Neither of those are things I follow, and either way design is better informed by the specific problems that need to be solved rather than by such general, prescriptive principles.
I agree, so obviously I direct it with more info and point it to code that I believe needs more of specific principles. But generally I would like Claude to produce more DRY code, it is great at reimplementing the same thing in five places instead of making a shared utility module.
1 reply →
I'm not sure how to interpret someone saying they don't follow DRY. Do you meant taking it to the Zealous extreme, or do you abhor helper functions? Is this a "No True Scottsman" thing?
5 replies →
>> But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.
This is exactly how I use it. I prefer Gemini 3 personally.
I try to learn as much as I can about different architectures, usually by reading books or other implementations and coding first principals to build a mental model. I apply the architecture to the problem and the AI fills in the gaps. I try my best to focus and cover those gaps.
The reason I think it is inconsistent in nailing a variety of tasks is the recipe for training LLMs, which is pre-training + RL. The RL environment sends a training signal to update all the weights in its trajectory for the successful response. Karpathy calls it “sucking supervision through a straw”. This breaks other parts of the model.
I remember about a problem I had while quick testing notcurses. I tried chatGPT which produced a lot of weird but kinda believable statements about the fact that I had to include wchar and define a specific preprocessor macro, AND I had to place the includes for notcurses, other includes and macros in a specific order.
My sentiment was "that's obviously a weird non-intended hack" but I wanted to test quickly, and well ... it worked. Later, reading the man-pages I aknowledged the fact that I needed to declare specific flags for gcc in place of the gpt advised solution.
I think these kind of value based judgements are hard to emulate for LLMs, it's hard for them to identifiate a single source as the most authoritative source in a sea of lesser authoritative (but numerous) sources.
It's fundamentally because of verifier's law [0].
Current AI, and in particular RL-based, is already or will soon achieve super human performance on problems that can be - quickly - verified and measured.
So maths, algorithms, etc and well defined bugs fall into that category.
However architectural decision, design, long-term planning where there is little data, no model allowing synthetic data generation, and long iteration cycles are not so much amenable to it.
[0] https://www.jasonwei.net/blog/asymmetry-of-verification-and-...
Using the plan mode in cursor (or asking claude to first come up with a plan) makes it pretty good at generic "how can I improve" prompts. It can spend more effort exploring the codebase and thinking before implementing.
> "Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
This is true, as for "Open Ended" I use Beads with Claude code, I ask it to identify things based on criteria (even if its open ended) then I ask it to make tasks, then when its done I ask it to research and ask clarifying questions for those tasks. This works really well.
> There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving
I'd hesitate to call this a blind spot. LLMs have a lot of actual blind spots - things people developing them overlook or deprioritize. This strikes me more as something acutely aware of & failing at, despite significant efforts to solve.
> There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving.
thats called job security!
I’ve had reasonable success having it ultrathink of every possible X (exhaustively) and their trades offs and then give me a ranked list and rationale of its top recommendations. I almost always choose the top but just reading the list and then giving it next steps has worked really well for me.
Not at all my experience. I’ve often tried things like telling Claude this SIMD code I wrote performed poorly and I needed some ideas to make it go faster. Claude usually does a good job rewriting the SIMD to use different and faster operations.
That sounds like a pretty "structured" problem to me.
that's one of the problems with AI. as it can accomplish more tasks people will overestimate it's ability.
what the person you replied to had claude do is relatively simple and structured, but to that person what claude did is "automagic".
People already vastly overestimate AI's capabilities. This contributes to that.
Performance optimization isn’t structured at all. I find it amazing that without access to profilers or anything Claude is able to respond to “anything I can do to improve the speed” with acceptable results.
I'm not a C++ programmer, but wouldn't your example be a fairly structured problem? You wanted to improve performance of a specific part of your code base.
Codex is better for the latter style. It takes its time, mulls about and investigates and sometimes finds a nugget of gold.
Claude is for getting shit done, it's not at its best at long research tasks.
The current paradigm is we sorta-kinda got AGI by putting dodgy AI in a loop:
until works { try again }
The stuff is getting so cheap and so fast... a sufficient increment in quantity can produce a phase change in quality.
My experience has been with Claude that having it "review" my code has produced some helpful feedback and refactoring suggestions, but also, it falls short in others
This tells me that we need to build 1000 more linters of all kinds
Unironically I agree.
One under-discussed lever that senior / principal engineers can pull is the ability to write linters & analyzers that will stop junior engineers ( or LLMs ) from doing something stupid that's specific to your domain.
Let's say you don't want people to make async calls while owning a particular global resource, it only takes a few minutes to write an analyzer that will prevent anyone from doing so.
Avoid hours of back-and-forth over code review by encoding your preferences and taste into your build pipeline and stop it at source.
And for more complex linters I find that it can be easy to get the LLM to write most of it itself!!!
>> "Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
Back in the day, we would just do this with a search engine.
It's not the same. Recently Opus 4.5 diagnosed and fixed a bug in the F# compiler for me, for example (https://github.com/dotnet/fsharp/pull/19123). The root cause is pretty subtle and very non-obvious, and of course the critical snippet of the stack trace `at FSharp.Compiler.Symbols.FSharpExprConvert.GetWitnessArgs` has no hits on Google other than my own bug report. I would have been completely lost fixing it.
I am basically rawdogging Claude these days, I don’t use MCPs or anything else, I just lay down all of the requirements and the suggestions and the hints, and let it go to work.
When I see my colleagues use an LLM they are treating it like a mind reader and their prompts are, frankly, dogshit.
It shows that articulating a problem is an important skill.
[dead]