← Back to context

Comment by milicat

7 days ago

The more I browse through this, the more I agree. I feel like one could delete almost all comments from that project without losing any information – which means, at least the variable naming is (probably?) sensible. Then again, I don't know the application domain.

Also…

  def _save_current_date_time(current_date_time_file: str, current_date_time: str) -> None:
    with Path(current_date_time_file).open("w") as f:
      f.write(current_date_time)

there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.

The scary thing is: I have seen professional human developers write worse code.

> I feel like one could delete almost all comments from that project without losing any information

I far from a heavy LLM coder but I’ve noticed a massive excess of unnecessary comments in most output. I’m always deleting the obvious ones.

But then I started noticing that the comments seem to help the LLM navigate additional code changes. It’s like a big trail of breadcrumbs for the LLM to parse.

I wouldn’t be surprised if vibe coders get trained to leave the excess comments in place.

  • More tokens -> more compute involved. Attention-based models work by attending every token with each other, so more tokens means not only having more time to "think" but also being able to think "better". That is also at least part of the reason why o1/o3/R1 can sometimes solve what other LLMs could not.

    Anyway, I don't think any of the current LLMs are really good for coding. What it's good at is copy-pasting (with some minor changes) from the massive code corpus it has been pre-trained. For example, give it some Zig code and it's straight unable to solve even basic tasks. Same if you give it really unique task, or if you simply ask for potential improvements of your existing code. Very, very bad results, no signs of out-of-box thinking whatsoever.

    BTW: I think what people are missing is that LLMs are really great at language modeling. I had great results, and boosts in productivity, just by being able to prepare the task specification, and do quick changes in that really easily. Once I have a good understanding of the problem, I can usually implement everything quickly, and do it in much much better way than any LLM can currently do.

    • I have tried getting gemini 2.5 to output "token efficient" code, i.e. no comments, keep variables to 1 or 2 letters, try to keep code as condensed as possible.

      It didn't work out that great. I think that all the context in the verbose coding it does actually helps it to write better code. Shedding context to free up tokens isn't so straightforward.

  • It doesn't hurt that the model vendors get paid by the token, so there's zero incentive to correct this pattern at the model layer.

    • or the model get trained from teaching code which naturally contains lots of comments.

      the dev is just lazy to not include them anymore, wheres the model doesn't really need to be lazy, as paid by the token

  • What’s worse, I get a lot of comments left saying what the AI did, not what the code does or why. Eg “moved this from file xy”, “code deleted because we have abc”, etc. Completely useless stuff that should be communicated in the chat window, not in the code.

  • LLMs are also good at commenting on existing code.

    It’s trivial to ask Claude via Cursor to add comments to illustrate how some code works. I’ve found this helpful with uncommented code I’m trying to follow.

    I haven’t seen it hallucinate an incorrect comment yet, but sometimes it will comment a TODO that a section should be made more more clear. (Rude… haha)

    • I have seldomly seen insightful comments from LLMs. It is usually better than "comment what the line does" usefull for getting a hint about undocumented code, but not by much. My experience is limited, but what I have I do agree with. As long as you keep on the beaten path it is ok. Comments are not such a thing.

>The scary thing is: I have seen professional human developers write worse code.

This is kind of the rub of it all. If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter? I mean at that point its kind of like looking at the output of a bytecode compiler and being like "wow what a mess". And it's not like they can't write code up to your stylistic standards, it's just literally a matter of prompting for that.

  • > If the code works, passes all relevant tests, is reasonably maintainable, and can be fitted into the system correctly with a well defined interface, does it really matter?

    You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.

    We did this song and dance pretty recently with dynamic typing. Developers thought it was so much more productive to use dynamically typed languages, because it is in the initial phases. Then years went by, those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities, and those developers who hyped up dynamic typing invented Python/PHP type hinting and Flow for JavaScript, later moving to TypeScript entirely. Nowadays nobody seriously recommends building long-lived systems in untyped languages, but they are still very useful for one-off scripting and more interactive/exploratory work where correctness is less important, i.e. Jupyter notebooks.

    I wouldn't be surprised to see the same pattern happen with low-supervision AI code; it's great for popping out the first MVP, but because it generates poor code, the gung-ho junior devs who think they're getting 10x productivity gains will wisen up and realize the value of spending an hour thinking about proper levels of abstraction instead of YOLO'ing the first thing the AI spits out when they want to build a system that's going to be worked on by multiple developers for multiple years.

    • I think the productivity gains of dynamic typed languages were real, and based on two things: dynamic typing (can) provide certain safety properties trivially, and dynamic typing neatly kills off the utterly inadequate type systems found in mainstream languages when they were launched (the 90s, mostly).

      You'll notice the type systems being bolted onto dynamic languages or found in serious attempts at new languages are radically different than the type systems being rejected by the likes of javascript, python, ruby and perl.

    • > those small, quick-to-make dynamic codebases ended up becoming unmaintainable monstrosities

      In my experience, type checking / type hinting already starts to pay off when more than one person is working on an even small-ish code base. Just because it helps you keep in mind what comes/goes to the other guy's code.

      5 replies →

    • > You're not wrong here, but there's a big difference in programming one-off tooling or prototype MVPs and programming things that need to be maintained for years and years.

      Humans also worry about their jobs, especially in PIP-happy companies; they are very well known for writing intentionally over-complicated code that only they understand so that they are irreplaceable

      9 replies →

    • I'm certainly extremely happy for having an extensive type system in my daily driver languages especially when working with AI coding assistance — it's yet another very crucial guard rail that ensures that keeps the AI on track and makes a lot of fuckups downright impossible.

  • what are you going to do when something suddenly doesn't work and cursor endlessly spins without progress no matter how many "please don't make mistakes" you add? delete the whole thing and try to one-shot it again?

    • Why do you HAVE TO one-shot? No one says you have to code like those influencers. You are a software engineer, use AI like one, iteratively.

      22 replies →

  • Good insight, and indeed quite exactly my state of mind while creating this particular solution.

    Iin this case, I did put in the guard rails to ensure that I reach my goal in hopefully a straight line and a quickly as possible, but to be honest, I did not give much thought to long-term maintainability or ease of extending it with more and more features, because I needed a very specific solution for a use case that doesn't change much.

    I'm definitely working differently in my brown-field projects where I'm intimately familiar with the tech stack and architecture — I do very thorough code reviews afterwards.

  • I think this code is at least twice the size than it needs to be compared to nicer, manually produced Python code: a lot of it is really superfluous.

    People have different definitions of "reasonably maintainable", but if code has extra stuff that provides no value, it always perplexes the reader (what is the point of this? what am I missing?), and increases cognitive load significantly.

    But if AI coding tools were advertised as "get 10x the output of your least capable teammate", would they really go anywhere?

    I love doing code reviews as an opportunity to teach people. Doing this one would suck.

  • Right, and the reason why professional developers are writing worse code out there is most likely because they simply don't have the time/aren't paid to care more about it. The LLM is then mildly improving the output in this brand of common real world scenario

> there is a lot of obviously useful abstraction being missed, wasting lines of code that will all need to be maintained.

This is a human sentiment because we can fairly easily pick up abstractions during reading. AIs have a much harder time with this - they can do it, but it takes up very limited cognitive resources. In contrast, rewriting the entire software for a change is cheap and easy. So to a point, flat and redundant code is actually beneficial for a LLM.

Remember, the code is written primarily for AIs to read and only incidentally for humans to execute :)

At the very least, if a professional human developer writes garbage code you can confidently blame them and either try to get them to improve or reduce the impact they have on the project.

With AI they can simply blame whatever model they used and continually shovel trash out there instantly.

  • I don't see the difference there. Whether I've written all the code myself or an AI wrote all of it, my name will be on the commit. I'll be the person people turn to when they question why code is the way it is. In a pull request for my commit, I'll be the one discussing it with my colleagues. I can't say "oh, the AI wrote it". I'm responsible for the code. Full stop.

    If you're in a team where somebody can continuously commit trash without any repercussions, this isn't a problem caused by AI.

> The scary thing is: I have seen professional human developers write worse code.

That's not the scary part. It's the honest part. Yes, we all have (vague) ideas of what good code looks like, and we might know it when we see it but we know what reality looks like.

I find the standard to which we hold AI in that regard slightly puzzling. If I can get the same meh-ish code for way less money and way less time, that's a stark improvement. If the premise is now "no, it also has to be something that I recognize as really good / excellent" then at least let us recognize that we have past the question if it can produce useful code.

  • I think there’s a difference in that this is about as good as LLM code is going to get in terms of code quality (as opposed to capability a la agentic functionality). LLM output can only be as good as its training data, and the proliferation of public LLM-generated code will only serve as a further anchor in future training. Humans on the other hand ideally will learn and improve with each code review and if they don’t want to you can replace them (to put it harshly).

  • I do believe it's amazing what we can build with AI tools today.

    But whenever someone advertises how an expert will benefit from it yet they end up with crap, it's a different discussion.

    As an expert, I want AI to help me produce code of similar quality faster. Anyone can find a cheaper engineer (maybe five of them?) that can produce 5-10x the code I need at much worse quality.

    I will sometimes produce crappy code when I lack the time to produce higher quality code: can AI step in and make me always produce high quality code?

    That's a marked improvement I would sign up for, and some seem to tout, yet I have never seen it play out.

    In a sense, the world is already full of crappy code used to build crappy products: I never felt we were lacking in that department.

    And I can't really rejoice if we end up with even more of it :)