← Back to context

Comment by fsloth

1 day ago

I strongly disagree agents are for extroverts.

I do agree it’s definetly a tool category with a unique set of features and am not surprised it’s offputting to some. But it’s appeal is definetly clear to me as an introvert.

For me LLM:s are just a computer interface you can program using natural language.

I think I’m slightly ADD. I love coding _interesting_ things but boring tasks cause extreme discomfort.

Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!

It’s a great time to be a software engineer!

> For me LLM:s are just a computer interface you can program using natural language.

I wish they were, but they're not that yet because LLMs aren't very good at logical reasonsing. So it's more like an attempt to program using natural language. Sometimes it does what you ask, sometimes not.

I think "programming" implies that the machine will always do what you tell it, whatever the language, or reliably fail and say it can't be done because the "program" is contradictory, lacks sufficient detail, or doesn't have the necessary permissions/technical capabilities. If it only sometimes does what you ask, then it's not quite programming yet.

> Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!

I wish that, too, were true, and maybe it will be someday soon. But if I need to manually review the agent's output, then it doesn't feel like offloading much aside from the typing. All the same concentration and thought are still required, even for the boring things. If I could at least trust the agent to tell me if it did a good job or is unsure that would have been helpful, but we're not even there yet.

That's not to say the tools aren't useful, but they're not yet "programming in a natural language" and not yet able to "offload" stuff to.

  • > ... LLMs aren't very good at logical reasonsing.

    I'm curious about what experiences led you to that conclusion. IME, LLMs are very good at the type of logical reasoning required for most programming tasks. E.g. I only have to say something like "find the entries with the lowest X and highest Y that have a common Z from these N lists / maps / tables / files / etc." and it spits out mostly correct code instantly. I then review it and for any involved logic, rely on tests (also AI-generated) for correctness, where I find myself reviewing and tweaking the test cases much more than the business logic.

    But then I do all that for all code anyway, including my own. So just starting off with a fully-fleshed out chunk of code, which typically looks like what I'd pictured in my head, is a huge load off my cognitive shoulders.

    • The experience was that I once asked an LLM to write a simple function and it produced something very wrong that nothing with good reasoning abilities should ever do. Of course, a drunk or very tired human could have done the same mistake, but they would have at least told me that they were impaired and unsure of their work.

      I agree that most of the time it does most simple tasks mostly right, but that's not good enough to truly "offload" my mental effort. Again, I'm not saying it's not useful, but more than working with a junior developer it's like working with a junior developer who may or may not be drunk or tired and doesn't tell you.

      But mostly my point is that LLMs seem to do logical reasoning worse than other things they do better, such as generating prose or summarising a document. Of course, even then you can't trust them yet.

      > But then I do all that for all code anyway, including my own

      I don't, at least not constantly. I review other people's code only towards the very end of a project, and in between I trust that they tell me about any pertinent challenge or insight, precisely so that I can focus on other things unless they draw my attention to something I need to think about.

      I still think that working with a coding assistant is interesting and even exciting, but the experience of not being able to trust anything, for me at least, is unlike working with another person or with a tool and doesn't yet allow me to focus on other things. Maybe with more practice I could learn to work with something I can't trust at all.

      9 replies →

    • In your example, you didn't ask the LLM to do any logic. You asked it to translate your logic into code.

      Asking an LLM to do logic would be saying something like: "I have a row of a million light switches. They all start off. I start at the beginning and flip on every fourth one. Then I flip on every eighth one, then sixteen, and all the powers of two until I'm over a million. Now I do the same for the powers of three, then four, then five, and so on. How many light switches are on at the end? Do not use any external coding tools for this; use your own reasoning."

      Note that the prompt itself is intentionally ambiguous -- a human getting this question should say "I don't understand why you started with every fourth instead of every second. Are you skipping the first integer of every power series or just when the exponent is two?"

      When I asked GPT5 to do it, it didn't care about that; instead it complimented me on my "crisp statement of the problem," roughly described a similar problem, and gave a belivable but incorrect answer 270,961 .

      I then asked it to write python code to simulate my question. It got the code correct, and said "If you run this, you’ll see it matches the 270,961 result I gave earlier." except, that was a hallucination.

      Running the code actually produced 252711.

      I guess it went with 270,961 because that was a lexically similar answer to some lexically similar problems in the training data.

      4 replies →

    • I'm not who you're replying to but I had a scenario where I needed to notice that a command had completed (exit code received) but keep listening for any output that was still buffered and only stop processing tokens after it had been quiet for a little bit.

      Trying to get Claude to do this without introducing a deadlock and without exiting too early and leaving valuable output in the pipe was hellish.

      It's very good at some kinds of reasoning and very bad at others. There's not much it's mediocre at.

  • You can view Claude Code as a non-deterministic compiler where you input english and get functioning code on the other end.

    The non-determinism is not as much as a problem because you are reading over the results and validating that what it is created matches what you tell it to do.

    I'm not talking about vibe-coding here, I'm grabbing the steering wheel with both hands because this car allows me to go faster than if I was driving myself, but sometimes you have to steer or brake. And the analogy favors Claude Code here because you don't have to react in milliseconds while programming.

    TL;DR: if you do the commit you are responsible for the code it contains.

    • Sure, and that may be valuable, but it's neither "programming" nor "offloading mental effort" (at least not much).

      Some have compared it to working with a very junior programmer. I haven't done that in a long while, but when I did, it didn't really feel like I was "offloading" much, and I could still trust even the most junior programmer to tell me whether the job was done well or not (and of any difficulties they encountered or insight they've learnt) much more than I can an agent, at least today.

      Trust is something we have, for the most part, when we work with either other people or with tools. Working without (or with little) trust is something quite novel. Personally, I don't mind that an agent can't accomplish many tasks; I mind a great deal that I can't trust it to tell me whether it was able to do what I asked or not.

      2 replies →

  • So here's a transcript of a recent Claude Code interaction I had, trying to build an MP3 organizer that relies on heuristics and fuzzy matching to make sense out of all the random MP3s I downloaded in the 90s. This all happened in-between prompts: I gave it a correct example and an incorrect example and asked it why it was giving the incorrect example.

    ● Let me trace through what happens with the "Jordan Rudess" example:

      1. Directory: Jordan Rudess/
      2. file_rel_path would be Path("Jordan Rudess") (after removing MP3 Disk prefix)
      3. If Jordan Rudess matches in path component 0, scope_path would be Path("Jordan Rudess")
      4. _path_is_in_scope checks if "Jordan Rudess".startswith("Jordan Rudess") → True
    
      This looks correct. Let me check the inference logic:
    

    ● Read(mp3_organizer_v2.py) ⎿ Read 15 lines (ctrl+r to expand)

    ● The inference priority looks correct - path_component_match is indeed the highest priority. But I think I see the issue now. Let me look at the actual logic flow in the main organize method:

    ● Read(mp3_organizer_v2.py) ⎿ Read 10 lines (ctrl+r to expand)

    ● The flow looks right. Let me examine the actual problem more carefully. Looking back at the _apply_path_component_matching method, I think there might be an issue with when the matching happens. Let me check the logic:

    ● Update Todos ⎿ Check _path_is_in_scope logic for correctness Verify artist inference priority logic works with path_component_match Test and fix any remaining issues with artist propagation

    ● I think I see the issue now. The problem is that the current code only looks for matches in files that already have file_id3 tags, but it needs to propagate those matches to all files in the directory. However, the current implementation is still only applying matches to files that have matching candidates, not ALL files. Let me check the exact logic:

    ● Read(mp3_organizer_v2.py) ⎿ Read 8 lines (ctrl+r to expand)

    ● Actually, looking at this more carefully, the logic should work. The matched_artists dictionary contains the matches found from ANY file that had a file_id3 tag matching a path component, and then it applies that match to ALL files in the file_infos list via the loop at line 737.

      Wait, let me understand the scoping issue better. Let me check what file_rel_path and scope_path would actually be in your example:
    

    ● Read(mp3_organizer_v2.py) ⎿ Read 5 lines (ctrl+r to expand)

    ● Ah! I think I see a potential issue. The file_rel_path is calculated as the directory path, not the individual file path. So for all files in the same directory, file_rel_path will be the same. This should actually work correctly.

      But let me double-check the scope matching logic. If the directory is "Jordan Rudess" and the scope_path is also "Jordan Rudess", then _path_is_in_scope should return
      True.
    
      Let me debug this by adding some debug output to see what's actually happening. But first, let me look at a potential issue: maybe the match isn't being found in the first place?
    

    It's doing a pretty complex chain of step-by-step reasoning here, looking in detail at the code. It ended up by printing out more debug info and having me re-run it again, then paste in the debug info, then add even more debug info and run it again. But it did eventually get the bug, which was non-trivial to identify (it was an aliasing problem where Claude was mutating state on a list and that list was shared with all the other files in the directory).

    • Huh. Alternate explanation: there's a layer of indirection, drawing upon the unthinkable size of the source data, so rather than 'issue forth tokens as if there is a person answering a question', you've got 'issue forth tokens as if there is a person being challenged to talk about their process', something that's also in the training data but in different contexts.

      I'm not sure statements of 'aha, I see it now!' are meaningful in this context. Surely this is just the em-dash of 'issue tokens to have the user react like you're thinking'?

> For me LLM:s are just a computer interface you can program using natural language.

Sort of. You still can't get a reliable output for the same input. For example, I was toying with using ChatGPT with some Siri shortcuts on my iPhone. I do photography on the side, and finding a good time for lighting for photoshoots is a usecase I use a lot so I made a shortcut which sends my location to the API along with a prompt to get the sunset time for today, total amount of daylight, and golden hour times.

Sometimes it works, sometimes it says "I don't have specific golden hour times, but you can find those on the web" or a useless generic "Golden hour is typically 1 hour before sunset but can vary with location and season"

Doesn't feel like programming to me, as I can't get reproducible output.

I could just use the LLM to write some API calling script from some service that has that data, but then why bother with that middle man step.

I like LLMs, I think they are useful, I use them everyday but what I want is a way to get consistent, reproducible output for any given input/prompt.

  • For things where I don't want creativity, I tell it to write a script.

    For example, "write a comprehensive spec for a script that takes in the date and a location and computes when golden hour is." | "Implement this spec"

    That variability is nice when you want some creativity, e.g. "write a beautiful, interactive boids simulation as a single file in html, css, and JavaScript."

    Words like "beautiful" and interactive" are open to interpretation, and I've been happy with the different ways they are interpreted.

>I think I’m slightly ADD. I love coding _interesting_ things but boring tasks cause extreme discomfort. >Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!

I agree and I feel that having LLM's do boilerplate type stuff is fantastic for ADD people. The dopamine hit you get making tremendous progress before you get utterly bored is nice. The thing that ADD/ADHD people are the WORST at is finishing projects. LLM will help them once the thrill of prototyping a green-field project is over.

  • Seconding this. My work has had the same problem - by the time I've got things all hooked up, figured out the complicated stuff - my brain (and body) clock out and I have to drag myself through hell to get to 100%. Even with ADHD stimulant medication. It didn't make it emotionally easier, just _possible_ lol.

    LLMs, particularly Claude 4 and now GPT-5 are fantastic at working through these todo lists of tiny details. Perfectionism + ADHD not a fun combo, but it's way more bearable. It will only get better.

    We have a huge moat in front of us of ever-more interesting tasks as LLMs race to pick up the pieces. I've never been more excited about the future of tech

    • Same here, especially for making bash scripts or lots of if this if that with logging type stuff, error handling etc..

      Oh and also, from what I know, ADHD and perfectionism is a very common combination, I'm not sure if everyone has that but I've heard it's the case for many with ADD. Same with "standards" being extremely high for everything

  • I'm kind of in this cohort. While in the groove, yea, things fly but, inevitably, my interest wanes. Either something too tedious, something too hard (or just a lot of work). Or, just something shinier shows up.

    Bunch of 80% projects with, as you mentioned, the interesting parts finished (sorta -- you see the line at the end of the tunnel, it's bright, just don't bother finishing the journey).

    However, at the same time, there's conflict.

    Consider (one of) my current projects, I did the whole back end. I had ChatGPT help me stand up a web front end for it. I am not a "web person". GUIs and what not are a REAL struggle for me because on the one hand, I don't care how things look, but, on the other, "boy that sure looks better". But getting from "functional" to "looks better" is a bottomless chasm of yak shaving, bike shedding improvements. I'm even bad at copying styles.

    My initial UI was time invested getting my UI to work, ugly as it was, with guidance from ChatGPT. Which means it gave me ways to do things, but mostly I coded up the actual work -- even if it was blindly typing it in vs just raw cut and paste. I understood how things were working, what it was doing, etc.

    But then, I just got tired of it, and "this needs to be Better". So, I grabbed Claude and let it have its way.

    And, its better! it certainly looks better, more features. It's head and shoulders better.

    Claude wrote 2-3000 lines of javascript. In, like, 45m. It was very fast, very responsive. One thing Claude knows is boiler plate JS Web stuff. And the code looks OK to me. Imperfect, but absolutely functional.

    But, I have zero investment in the code. No "ownership", certainly no pride. You know that little hit you get when you get Something Right, and it Works? None of that. Its amazing, its useful, its just not mine. And that's really weird.

    I've been striving to finish projects, and, yea, for me, that's really hard. There is just SO MUCH necessary to ship. AI may be able to help polish stuff up, we'll see as I move forward. If nothing else it may help gathering up lists of stuff I miss to do.

  • Ironically, I find greenfield projects the least stimulating and the most rote, aside from thinking about system design.

    I've always much preferred figuring out how to improve or build on existing messy systems and codebases, which is certainly aided by LLMs for big refactoring type stuff, but to be successful at it requires thinking about how some component of a system is already used and the complexity of that. Lots of edge cases and nuances, people problems, relative conservativeness.

  • Looks like the definition of boilerplate will continue to shift up the chain

I find Claude great at all of the boilerplate needed to get testing in place. It's also pretty good at divining test cases to lock in the current behavior, even if it's buggy. I use Claude as a first pass on tests, then I run through each test case myself to make sure it's a meaningful test. I've let it loose on the code coverage loop as well, so it can drill in and get those uncommon lines covered. I still don't have a good process for path coverage, but I'm not sure how easy that is in go as I haven't checked into it much yet.

I'm with you 100% on the boring stuff. It's generally good at the boring stuff *because* it's boring and well-trod.

It's interesting that every task in the world is boring to somebody, which means nothing left in the world will be done by those interested in it, because somebody will gladly shotgun it with an AI tool.

> For me LLM:s are just a computer interface you can program using natural language. ... boring tasks cause extreme discomfort ... Now - I can offload the most boring task to LLM and spend my mental energy on the interesting stuff!

The problem with this perspective, is that when you try to offload exactly the same boring task(s), to exactly the same LLM, the results you get back are never even close to being the same. This work you're offloading via natural language prompting is not programming in any meaningful sense.

Many people don't care about this non-determinism. Some, because they don't have enough knowledge to identify, much less evaluate, the consequent problems. Others, because they're happy to deal with those problems, under the belief that they are a cost that's worth the net benefit provided by the LLM.

And there are also many people who do care about this non-determinism, and aren't willing to accept the consequent problems.

Bluntly, I don't think that anyone in group (1) can call themselves a software engineer.