Comment by xnorswap

1 day ago

I won't say too much, but I recently had an experience where it was clear that when talking with a colleague, I was getting back chat GPT output. I felt sick, like this just isn't how it should be. I'd rather have been ignored.

It didn't help that the LLM was confidently incorrect.

The smallest things can throw off an LLM, such as a difference in naming between configuration and implementation.

In the human world, you can with legacy stuff get in a situation where "everyone knows" that the foo setting is actually the setting for Frob, but with an LLM it'll happily try to configure Frob or worse, try to implement Foo from scratch.

I'd always rather deal with bad human code than bad LLM code, because you can get into the mind of the person who wrote the bad human code. You can try to understand their misunderstanding. You can reason their faulty reasoning.

With bad LLM code, you're dealing with a soul-crushing machine that cannot (yet) and will not (yet) learn from its mistakes, because it does not believe it makes mistakes ( no matter how apologetic it gets ).

Ugh. I worked with a PM who used AI to generate PRDs. Pretty often, we’d get to a spot where we were like “what do you mean by this” and he’d respond that he didn’t know, the AI wrote it. It’s like he just stopped trying to actually communicate an idea, and replaced it with performative document creation. The effect was to basically push his job of understanding requirements down to me, and I didn’t really want to interact with someone who couldn’t be bothered figuring out his own thoughts before trying to put me to work implementing them so I left the team.

  • Well that's when you escalate the concern (tactfully and confidentially) to your resource manager and/or the Product Manager's resource manager. And if they don't take corrective action then it's time to look for a new job.

    • If I was stuck there I probably would have pushed it, but I had better options than setting out on an odyssey to reform a product team.

      It got me thinking that in general, people with options will probably sort themselves out of those situations and into organizations with like-minded people who use AI as a tool to multiply their impact (and I flatter myself to think that it will be high ability people who have those options), leaving those more reliant on AI to operate at the limit of what they get from OpenAI et al.

  • What the heck, the universal job description of a PM is to genuinely understand the requirements of their product. I'm always baffled how such people stay in those roles without getting fired.

    • For consideration, one can pretty objectively determine a programmer who is not qualified. Secretary. CFO. Sysadmin. How would one judge a product manager? That there's no product? That it sucks balls? "We're soliciting feedback and finding product market fit, iterating, A/B testing, we'll be better next quarter, goto 1"

      I wouldn't want that job, but I also don't currently know how to bring demonstrable evidence that they're incompetent, either

      I have roughly the same opinion about UX folks, but they don't jam up my day to day nearly as much as PMs

      1 reply →

>"It didn't help that the LLM was confidently incorrect."

Has anyone else ever dealt with a somewhat charismatic know-it-all who knows just enough to give authoritative answers? LLM output often reminds me of such people.

  • That’s a great question — and one that highlights a subtle misconception about how LLMs actually work.

    At first glance, it’s easy to compare them to a charismatic “know-it-all” who sounds confident while being only half-right. After all, both can produce fluent, authoritative-sounding answers that sometimes miss the mark. But here’s where the comparison falls short — and where LLMs really shine:

    (...ok ok, I can't go on.)

    • Most of the most charismatic, confident know-it-alls I have ever met have been in the tech industry. And not just the usual suspects (founders, managers, thought leaders, architects) but regular rank-and-file engineers. The whole industry is infested with know-it-alls. Hell, HN is infested with know-it-alls. So it's no surprise that one of the biggest products of the decade is an Automated Know-It-All machine.

      2 replies →

    • Perfect! You really got to the core of the matter! The only thing I noticed is that your use of the em-dash needs to not be bracketed with spaces on either end. LLMs—as recommended by most common style guides—stick to the integrated style that treats the em-dash as part of the surrounding words.

      4 replies →

  • If those people are wrong enough times, they are either removed from the organization or they scare anyone competent away from the organization, which then dies. LLMs seem to be getting a managerial pass (because the cost is subsidized by mountains of VC money and thus very low (for now)) so only the latter outcome is likely.

Agreed on bad human code > bad llm code.

Bad human code to me is at least more understandable in what it was trying to do. There's a goal you can figure out and fix it. It generally operates within the context of larger code to some extant.

Bad LLM code can be broken from start to finish in ways that make zero sense. Even worse when it re-invents the wheel and replaces massive amounts of code. Human aren't likely just make up a function or methods that don't exist and deploy it. That's not the best example as you'd likely find that out fast, but it's the kind of screw up that indicates the entire chunk of LLM code you're examining may in fact be fundamentally flawed beyond normal experience. In some cases you almost need to re-learn the entire codebase to truly realize "oh this is THAT bad and none of this code is of any value".

I had an experience earlier this week that was kind of surreal.

I'm working with a fairly arcane technical spec that I don't really understand so well so I ask Claude to evaluate one of our internal proposals on this spec for conformance. It highlights a bunch of mistakes in our internal proposal.

I send those off to someone in our company that's supposed to be an authority on the arcane spec with the warning that it was LLM generated so it might be nonsense.

He feeds my message to his LLM and asks it to evaluate the criticisms. He then messages me back with the response from his LLM and asks me what I think.

We are functionally administrative assistants for our AIs.

If this is the future of software development, I don't like it.

  • In your specific case, I think it’s likely an intentionally pointed response to your use of LLM.

    • I'll admit it. I've done this, but only a few times and only when someone sent me truly egregious AI slop—the kind where it's obvious no human that respects my time ever looked at it.

      My reaction is usually, "Oh, we're doing this? Fine." I'll even prompt my LLM with something like, "Make it sound as corporate and AI-generated as possible." Or, if I'm feeling especially petty, "Write this like you're trying to win the 2025 award for Most Corporate Nonsense, and you're a committee at a Fortune 500 company competing to generate the most boilerplate possible." It's petty, sure, but there's something oddly cathartic about responding to slop with slop.

    • I'm certain it wasn't in this particular case, but yeah, that's definitely going to happen as we all become more annoyed by people shoveling AI-generated crap in our faces and asking us to think about it for them.

> ... "because it does not believe it makes mistakes" ...

Because it doesn't actually believe anything at all, because these things don't think or feel or know anything. They just string together statistically likely language tokens one after another with a bit of random "magic" thrown in the mix to simulate "creativity".

> "everyone knows" that the foo setting is actually the setting for Frob, but with an LLM it'll happily try to configure Frob or worse, try to implement Foo from scratch.

Making the implicit explicit is a task for your documentation team, who should also be helping prep inputs for your LLMs. If foo and Frob are the same, have the common decency to tell the LLM...

  • The driver of this output is a uniform lack of comprehension all the way down.

i gave a ppt of 4-5 slides laying out an approach to implementing a business requirement to a very junior dev. I wanted to make sure they understood what was going on so i asked them to review the slides and then explain it back to me as if i'm seeing them for the first time. What i got back was the typical overly verbose and articulate review from chatgpt or some other llm. I thought it was pretty funny that they thought it would work let alone be acceptable to do that. When i called them and asked, "now do it for real" i ended up answering a dozen questions but hung up knowing they actually did understand the approach.

  • > What i got back was the typical overly verbose and articulate review from chatgpt or some other llm. I thought it was pretty funny that they thought it would work let alone be acceptable to do that.

    Did that end up working for you?

    I had this same experience recently, and it floored my expectations for that dev, it just felt so wrong.

    I made it abundantly clear that it was substandard work with comically wrong content and phrasings, hoping that he would understand that I trust _him_ to do the work, but I still later saw signs of it all over again.

    I wish there was something other than "move on". I'm just lost, and scarred.

It's annoying when it apologizes for a "misunderstanding" when it was just plain wrong about something. What would be wrong with it just saying, "I was wrong because LLMs are what they are, and sometimes we get things very wrong"?

Kinda funny example: The other day I asked Grok what a "grandparent" comment is on HN. It said it's the "initial comment" in a thread. Not coincidentally, that was the same answer I found in a reddit post that was the first result when I searched for the same thing on DuckDuckGo, but I was pretty sure that was wrong.

So I gave Grok an example: "If A is the initial comment, and B is a reply to A, and C a reply to B, and D a reply to C, and E a reply to D, which is the grandparent of C?" Then it got it right without any trouble. So then I asked: But you just said it's the initial comment, which is A. What's the deal? And then it went into the usual song and dance about how it misunderstood and was super-sorry, and then ran through the whole explanation again of how it's really C and I was very smart for catching that.

I'd rather it just said, "Oops, I got it wrong the first time because I crapped out the first thing that matched in my training data, and that happened to be bad data. That's just how I work; don't take anything for granted."

  • Ummm, are you saying that C is the grandparent of C, or do you have a typo in your example? Sure, the initial comment is not necessarily the grandparent, but in your ABCDE example, A is the grandparent of C, and C is the grandparent of E.

    Maybe I'm just misreading your comment, but it has me confused enough to reset my password, login, and make this child comment.

  • > I'd rather it just said ...

    Yes, but why would it? "Oops, I got it wrong the first time because I crapped out the first thing that matched in my training data" isn't in the training data. Yet.

    So it can't come out of the LLM: There's no actual introspection going on, on any of these rounds. Just using training data.

It's so upsetting to see people take the powerful tool that is an LLM and pretend like it's a solution for everything. It's not. They're awesome at a lot of things, but they need a user that has context and knowledge to know when to apply or direct it in a different way.

The amount of absolutely shit LLM code I've reviewed at work is so sad, especially because I know the LLM could've written much better code if the prompter did a better job. The user needs to know when the solution is viable for an LLM to do or not, and a user will often need to make some manual changes anyway. When we pretend an LLM can do it all, it creates slop.

I just had a coworker a few weeks ago produce a simple function that wrapped a DB query in a function (normal so far), but wrote 250 lines of tests for it. All the code was clearly LLM generated (the comments explaining the most mundane of code was the biggest give away). The tests tested nothing. It mocked the ORM and then tested the return of the mock. We were testing that the mocking framework worked? I told him that I don't think the tests added much value since the function was so simple and that we could remove them. He said he thought they provided value, with no explanation, and merged the code.

Now fast forward to the other day and I run into the rest of the code again and now it's sinking in how bad the other LLM code was. Not that it's wrong, but it's poorly designed and full of bloat.

I have no issue with the LLM - they can do some incredible things and they're a powerful tool in the tool belt, but they are to be used in conjunction with a human that knows what they're doing (at least in the context of programming).

Kind of a rant, but I absolutely see a future where some code bases are well maintained and properly built, while others have tacked on years of vibe-coded trash that now only an LLM can even understand. And the thing that will decide which direction a code base goes in will be the engineers involved.

  • > Kind of a rant, but I absolutely see a future where some code bases are well maintained and properly built, while others have tacked on years of vibe-coded trash that now only an LLM can even understand.

    this offshoring all over again. At first, every dev in the US was going to be out of a job because of how expensive they were compared to offshore devs. Then the results started coming back and there was some very good work done offshore but there was tons and tons of stuff that had to be unwound and fixed with onshore teams. Entire companies/careers were made dedicated to just fixing stuff coming back from offshore dev teams. In the end, it took a mix of both to realize more value per dev $

  • > I absolutely see a future where some code bases are well maintained and properly built, while others have tacked on years of vibe-coded trash

    Technical debt at a payday loan interest rate.

  • That's why some teams have the rule that the PR author isn't allowed to merge but only one of the approvers

  • This is why [former Codeium] Windsurf's name is so genius.

    Windsurfing (the real activity) requires multiple understandings:

    1) How to sail in the first place

    2) How to balance on the windsurfer while the wind is blowing on you

    If you can do both of those things, you can go VERY fast and it is VERY fun.

    The analogy to the first thing is "understanding software engineering" (to some extent). The analogy to the second thing is "understanding good prompting while the heat of deadlines is on you". Without both, you are just creating slop (falling in the water repeatedly and NOT going faster than either surfing or sailing alone). Junior devs that are leaning too hard on LLM assistance right off the bat are basically falling in the water repeatedly (and worse, without realizing it).

    I would at minimum have a policy of "if you do not completely understand the code written by an LLM, you will not commit it." (This would be right after "you will not commit code without it being tested and the tests all passing.")

I'm seeing the worst of both worlds where a human support engineer just blindly copies and pastes whatever internal LLM spit out.

A notion comment on a story the other day started with, "you're absolutely right" and that is when I had to take a moment outside for myself.

  • I swear that in 3 years, managers are going to realize this constant affirmation… causes staff to lose mental tolerance for anything not clappy-happy. Same with schools.

> I was getting back chat GPT output

I would ask them for an apple pie recipe and report to HR

  • I get that this is a joke, but the bigger issue is that there's no easy fix for this because other humans are using AI tools in a way that destroys their ability to meaningfully work on a team with competent people.

    There are a lot of people reading replies from more knowledgeable teammates, feeding those replies into LLMs, and pasting the response back to their teammates. It plays out in public on open source issue threads.

    It's a big mess, and it's wasting so much of everyone's time.

    • As with every other problem with no easy fix, if it is important - it should be regulated. It should not be hard for a company to prohibit LLM-assisted communication, if management believes that it is inherently destructive (e.g. feeding generated messages into message summarizers).

  • > I would ask them for an apple pie recipe and report to HR

    i do this sometimes except i reply asking them to rephrase their comment in the form of a poem. Then screenshot the response and add it as an attachment before the actual human deletes the comment.

  • I had a QA inspector asking me a question on teams about some procedure steps before we ran a test. I answered them and he replied back with this message absolutely dripping in AI slop. I was expecting "ok thanks I'll tell them" and instead got back "Thank you. I really appreciate your response. I'll let them know and I'm sure they will feel relieved to know your opinion." Like wtf is that. I had to make sure I was talking to the right guy. This guy definitely doesn't talk like that in-person. It's not my opinion and I highly doubt anyone was worried to the point they'd feel relief to hear my clarification.

    • Fun part that it's immediately obvious to everyone who worked with LLMs. I wonder what future "enhancements" big tech would come up with to make slop speech less robotic/recognizable.

      And it's unfortunate that many people would start associating long texts as generated by default. Related XKCD: https://xkcd.com/3126/