← Back to context

Comment by grey-area

6 days ago

I’d love to see the authors of effusive praise of generative AI like this provide the proof of the unlimited powers of their tools in code. If GAI (or agents, or whatever comes next …) is so effective it should be quite simple to prove that by creating an AI only company and in short order producing huge amounts of serviceable code to do useful things. So far I’ve seen no sign of this, and the best use case seems to be generating text or artwork which fools humans into thinking it has coherent meaning as our minds love to fill gaps and spot patterns even where there are none. It’s also pretty good at reproducing things it has seen with variations - that can be useful.

So far in my experience watching small to medium sized companies try to use it for real work, it has been occasionally useful for exploring apis, odd bits of knowledge etc, but overall wasted more time than it has saved. I see very few signs of progress.

The time has come for llm users to put up or shut up - if it’s so great, stop telling us and show and use the code it generated on its own.

I think we're talking past each other. There's always been a threshold: above it, code changes are worth the effort; below it, they sit in backlog purgatory. AI tools so far seem to lower implementation costs, moving the threshold down so more backlog items become viable. The "5x productivity" crowd is excited about this expanded scope, while skeptics correctly note the highest value work hasn't fundamentally changed.

I think what's happening is two groups using "productivity" to mean completely different things: "I can implement 5x more code changes" vs "I generate 5x more business value." Both experiences are real, but they're not the same thing.

https://peoplesgrocers.com/en/writing/ai-productivity-parado...

  • My friends at companies where AI tools are either mandated or heavily encouraged report that they're seeing a significant rise in low-quality PRs that need to be carefully read and rejected.

    A big part of my skepticism is this offloading of responsibility: you can use an AI tool to write large quantities of shitty code and make yourself look superficially productive at the cost of the reviewer. I don't want to review 13 PRs, all of which are secretly AI but pretend to be junior dev output, none of which solve any of the most pressing business problems because they're just pointless noise from the bowels of our backlog, and have that be my day's work.

    Such gatekeeping is a distraction from my actual job, which is to turn vague problem descriptions into an actionable spec by wrangling with the business and doing research, and then fix them. The wrangling sees a 0% boost from AI, the research is only sped up slightly, and yeah, maybe the "fixing problems" part of the job will be faster! That's only a fraction of the average day for me, though. If an LLM makes the code I need to review worse, or if it makes people spend time on the kind of busywork that ended up 500 items down in our backlog instead of looking for more impactful tasks, then it's a net negative.

    I think what you're missing is the risk, real or imagined, of AI generating 5x more code changes that have overall negative business value. Code's a liability. Changes to it are a risk.

  • This is exactly what I’ve experienced. For the top-end high-complexity work I’m responsible for, it often takes a lot more effort and research to write a granular, comprehensive product spec for the LLM than it does to just jump in and do it myself.

    On the flip side, it has allowed me to accomplish many lower-complexity backlog projects that I just wouldn’t have even attempted before. It expands productivity on the low end.

    I’ve also used it many times to take on quality-of-life tasks that just would have been skipped before (like wrapping utility scripts in a helpful, documented command-line tool).

    • This also accounts for the author of TFA's sense that the smartest people they know are skeptics. Assuming they're being used well, those people spend far more of their time in the high complexity work than they do in the low complexity stuff, so LLMs seem to be more flashy toys than serious tools to them.

      2 replies →

    • > On the flip side, it has allowed me to accomplish many lower-complexity backlog projects that I just wouldn’t have even attempted before

      This has been my experience at well - AI coding tools are like a very persistent junior-- that loves reading specs and documentation. The problem for AI companies is "automated burndown of your low-complexity backlog items" isn't a moneymaker, even though that's what we have. So they have to sell a dream that may be realized, or may not.

      The benchmark project in the article is the perfect candidate for AI: well defined requirements with precise technical terms (RFCs), little room for undefined behavior and tons of reference implementations. This is an atypical project. I am confident AI agent write an HTTP2 server, but it will also repeatedly fail to write sensible tests for human/business processes that a junior would excel at.

      1 reply →

  • I think this is actually a really good point. I was just recently thinking that LLMs are (amongst other things) great for streamlining these boring energy-draining items that "I just want done" and aren't particularly interesting, but at the same time they do very little to help us juggle more complex codebases right now.

    Sure, they might help you onboard into a complex codebase, but that's about it.

    They help in breadth, not depth, really. And to be clear, to me that's extremely helpful, cause working on "depth" is fun and invigorating, while working on "breadth" is more often than not a slog, which I'm happy to have Claude Code write up a draft for in 15 minutes, review, do a bunch of tweaks, and be done with.

    • +1 this breadth vs depth framing. I notice this in aider itself: What right does that project have to support all those command line options, covering every little detail, and all optionally via Env variables too, and/or yaml file, and .MD docs of them all up to date? Answer: aider itself was clearly used to write all that breadth of features.

  • You seem to think generating 5x more code results in better code, in the left column. I highly doubt this.

    • Yes there are huge unstated downsides to this approach if this is production code (which prototypes often become).

    • It depends?

      There's certainly a lot of code that needs to be written in companies that is simple and straightforward and where LLMs are absolutely capable of generating code as good as your average junior/intermediate developer would have written.

      And of course there are higher complexity tasks where the LLM will completely face plant.

      So the smart company chooses carefully where to apply the LLM and possibly does get 5x more code that is "better" in the sense that there's 5x more straightforward tickets closed/shipped, which is better than if they had less tickets closed/shipped.

      2 replies →

    • I'm attempting to vibe code something for the first time. It seems to work, but the amount of cruft being generated is astounding. It's an interesting learning experience, anyways.

  • > The "5x productivity" crowd is excited about this expanded scope, while skeptics correctly note the highest value work hasn't fundamentally changed.

    This is true, LLMs can speed up development (some asterisks are required here, but that is generally true).

    That said, I've seen, mainly here on HN, so many people hyping it up way beyond this. I've got into arguments here with people claiming it codes at "junior level". Which is an absurd level of bullshit.

    • Exactly. Juniors may have vastly less breadth of knowledge than an LLM, but they can learn and explore and test software in ways that LLMs cannot.

      However, the expansion in scope that senior developers can tackle now will take away work that would ordinarily be given to juniors.

      1 reply →

Approximately speaking, what do you want to see put up?

I ask this because it reads like you have a specific challenge in mind when it comes to generative AI and it sounds like anything short of "proof of the unlimited powers" will fall short of being deemed "useful".

Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea. They're not even suggesting it should be used without supervision. It's useful when used by people who understand its limitations and leverage its strengths.

If you want to see how it's been used by someone who was happy with the results, and is willing to share their results, you can scroll down a few stories on the front-page and check the commit history of this project:

https://github.com/cloudflare/workers-oauth-provider/commits...

Now here's the deal: These people aren't trying to prove anything to you. They're just sharing the results of an experiment where a very talented developer used these tools to build something useful.

So let me ask you this: Can we at least agree that these tools can be of some use to talented developers?

  • Yes sure I’ve checked in code generated by AI myself. I’ve not experienced the excitement this article exudes though and it seems very limited in usefulness due to the by now well-documented downsides. Frankly I haven’t bothered using it much recently, it’s just not there yet IME and I’m not sure LLMs ever will be.

    What I’m interested in really is just case studies with prompts and code - that’s a lot more interesting for hackers IMO than hype.

  • It's useful, but the promise of every AI company is very explicitly that they will burn the seed corn and choke off the pipeline that created those "very talented" developers who reviewed it!

    • I’m less worried about this as the best way to learn to code is to read as well as write it IMO.

      If capabilities don’t improve it’s not replacing anyone, if they do improve and it can write good code, people can learn from reading that.

      I don’t see a pathway to improvement though given how these models work.

  • > Here's the deal: Reasonable people aren't claiming this stuff is a silver bullet or a panacea

    This article and vocal supporters are not being reasonable at all, they make a not so between-the-lines separation between skeptics (which are nuts) and supporters ("My smartest friends are blowing it off." in a smug "I'm smarter than my smarter friends").

    I mean, come on.

    • You are absolutely correct. This article and vocal supporters are often not reasonable and I should have made that point.

      I honestly found the article to be an insufferably glib and swaggering piece that was written to maximize engagement rather than to engage the subject seriously.

      The author clearly values maximizing perceived value with the least amount of effort.

      Frankly, I’m tired of reading articles by people who can’t be bothered to present the arguments of the people they’re disagreeing with honestly and I just gave up halfway reading it because it was so grating.

  • > Reasonable people aren't claiming this stuff is a silver bullet or a panacea.

    Are you saying the CEO of Anthropic isn't reasonable? or Klarna?

    • The CEO of Anthropic is the least reasonable person in this discussion.

      Surely you can see how insanely biased all of their statements would be. They are literally selling the shovels in this gold rush.

      Anything they say will be in service of promoting AI, even the bad/cautionary stuff because they know there's an audience who will take it the other way (or will choose to jump in to not be left behind), and also news is news, it keeps people talking about AI.

      1 reply →

    • Of course not. CEOs are Chief Narrative Officers, one of their main functions is to craft and push a message (which is different than collating and reporting facts). Reason doesn’t not factor in.

  • I think that experiment was very cool, but I will say that the OAuth2.0/OIDC protocol is very well documented and there are tons of tools already built around it in multiple languages.

    I implemented the OAuth2.0 protocol in 3 different languages without a 3rd party library - entire spec implemented by hand. This was like ~2015 when many of the libraries that exist today didn't back then. I did this as a junior developer for multiple enterprise applications. At the end of the day it's not really that impressive.

    • Three weeks ago I did basically the same thing as the author of the Cloudflare story, but I did it with my own open source tool. I went into the experiment treating Claude Code as a junior engineer and guiding it on a feature I wanted implemented.

      In a single Saturday the LLM delivered the feature to my spec, passing my initial test cases, adding more tests, etc…

      I went to bed that night feeling viscerally in my bones I was pairing with and guiding a senior engineer not a junior. The feature was delivered in one day and would have taken me a week to do myself.

      I think stories like the Cloudflare story are happening all over right now. Staff level engineers are testing hypotheses and being surprised at the results.

      Oauth 2.0 doesn’t really matter. If you can guide the model and clearly express requirements, boundaries, and context, then it’s likely to be very useful and valuable in its current form.

    • This is a great example of how no example provided is ever good enough. There’s always an argument that it doesn’t really count. Yet you just said the computer is doing what you did as a junior developer.

    • It's not supposed to be impressive. It's a faster way to do the unimpressive stuff. Which is the bulk of real-world software work at most companies.

      Maybe you just have that dream job where you only have to think hard thoughts. But that's just not the norm, even at a bleeding edge startup.

    • Exactly how long did it take you? And now how much actual time was spent in the comparison prompting and code review by Cloudflare?

Yeah exactly.

Whats nuts is watching all these people shill for something that we all have used to mediocre results. Obviously Fly.io benefits if people start hosting tons of slopped together AI projects on their platform.

Its kinda sad to watch what I thought was a good company shill for AI. Even if they are not directly getting money from some PR contract.

We must not be prompting hard enough....

  • Saying "this tool is genuinely useful to me and it's baffling how many people refuse to acknowledge that could possible be true" is not a sign that someone is being paid to "shill for AI".

    (If it is then damn, I've been leaving a ton of money on the table.)

    • Some days it seems like the most vulgar Marxist would be more rigorous about doing a who-benefits analysis before using that to dismiss an argument than the median HN commenter. Fly.io benefits from AI hype because... more slop code gets written and then run on their servers? tptacek would burn his HN cred for that? Really?

      3 replies →

  • > Whats nuts is watching all these people shill for something that we all have used to mediocre results.

    this sort of post is the start of next phase in the battle for mindshare

    the tools are at the very best mediocre replacements for google, and the people with a vested interest in promoting them know this, so they switch to attacking critics of the approach

    > Its kinda sad to watch what I thought was a good company shill for AI.

    yeah, I was sad too, then I scrolled up and saw the author. double sadness.

    • If you really think that feel free to continue with business as usual. I just hope you're not at a stack ranking company, or you are politically savvy though, because otherwise you're going to be in for a real shock in the next few years as your peers build their AI skills, tooling matures and models improve. A skilled dev with a well tuned agentic workflow can already finish non-trivial 5k LoC projects in a day, complete with copious tests and documentation, just imagine when the ecosystem has matured and the majority of your coworkers are hip to the game.

      4 replies →

> … if it’s so great, stop telling us and show …

If you’re selling shovels to gold miners, you don’t need to demonstrate the shovel - you just need decent marketing to convince people there’s gold in them thar hills.

  • This is actually a great metaphor and phrasing and I'm filing it away for later btw.

    • It's a bit more than a metaphor :) during the California gold rush there was this guy named Sam Brannan who sold shovels and other tools to miners, and made a fortune from it (he is often referred to as California's first millionaire). He also had a newspaper at the time, the California Star, which as you can imagine was used to promote the gold rush:

      > The excitement and enthusiasm of Gold Washing still continues—increases. (1848)

      https://sfmuseum.org/hist6/star.html

      https://en.wikipedia.org/wiki/Samuel_Brannan

    • Note that it's a pretty common cliche, usually phrased something like "in a gold rush, the only people guaranteed to make money are the guys selling the shovels".

      1 reply →

Many, many people are in fact “using the code it generated on its own”. I’ve been putting LLM-assisted PRs into production for months.

With no disrespect meant, if you’re unable to find utility in these tools, then you aren’t using them correctly.

  • > unable to find utility.. aren’t using them correctly

    Tfa makes this argument too then later says:

    > All this is to say: I write some Rust. I like it fine. If LLMs and Rust aren’t working for you, I feel you. But if that’s your whole thing, we’re not having the same argument

    So reasonable people admit that the utility depends on the use case.. then at the same time say you must be an idiot if you aren’t using the tools. But.. this isn’t actually a reasonable position.

    Part of the issue here may be that so many programmers have no idea what programmers do outside of their niche, and how diverse programming actually is.

    The typical rebuttals of how “not everyone is doing cliche CRUD web dev” is just the beginning. Author mentions kernel dev, but then probably extrapolated to C dev in general. But that would be insane, just think about the training sets for Linux kernel dev vs everything else..

    It’s dumb to have everyone double down on polarizing simplistic pro/con camps, and it’s rare to see people even asking “what kind of work are you trying to do” before the same old pro/con arguments start flying again.

    • Yeah, very much agreed that in the spirit of good discussion I should’ve at least asked about their experiences and use case before jumping to finger wagging.

      But that said, let me reiterate a couple important points from my post:

      > With no disrespect meant

      I’m not calling anybody an idiot because they aren’t using an LLM. I’m sharing my honest opinion that they’re not using it correctly, but that’s very different than calling them an idiot.

      > if you’re unable to find utility in these tools

      This is a bit lawyerly, but note my carefully generic wording here: “find utility”. If you’re a Rust developer who doesn’t like the Rust output from your LLM, sure - but that’s not 100% of the job.

      You’ll also touch bash scripts, make files, YAML, JSON or TOML config, write bug reports/feature requests, discuss architectural ideas and coding patterns, look through stack traces/dumps/error logs, or whatever else.

      My point is that it is exceedingly unlikely that there is nothing an LLM can do to help your work, even if it’s not good at writing code in your domain.

      Hence the statement that if you cannot find utility, you’re not using it correctly. It takes time to learn how to use these tools effectively, even in domains they excel in.

    • Its such an insane argument

      Its like I can't just switch our whole 1-million line codebase on a dime

      These articles act like everyone is just cranking out shitty new webapps, as if every software job is the same as the author's

  • Okay, how am I supposed to use them "correctly"? Because me explaining step by step, more so than a junior developer, how to do a small task in an existing codebase for it to get it wrong not once, not twice, not three times, but more is not a productivity boost.

    And here's the difference between someone like me and an LLM: I can learn and retain information. If you don't understand this, you don't have a correct understanding of LLMs.

    • It is entirely true that current LLMs do not learn from their mistakes, and that is a difference between eg an LLM and a human intern.

      It is us, the users of the LLMs, that need to learn from those mistakes.

      If you prompt an LLM and it makes a mistake, you have to learn not to prompt it in the same way in the future.

      It takes a lot of time and experimentation to find the prompting patterns that work.

      My current favorite tactic is to dump sizable amounts of example code into the models every time I use them. I find this works extremely well. I will take code that I wrote previously that accomplishes a similar task, drop that in and describe what I want it to build next.

      1 reply →

  • yep I've used Devon and now Google Jules, for the big stuff, it has lots of wrong code, but it still end up giving my a much better start than starting from scratch certainly. When it all comes together it give me a 6X boost. But def fixing all the wrong code and thoroughly testing it is the time consuming part.

  • > LLM-assisted PRs

    This does not counter what GP said. Using LLM as a code assistant is not the same as "I don't need to hire developers because LLMs code in their place"

The first paragraph of this blog calls senior skeptical developers "unserious" but the person who wrote it has a cartoon gopher as his avatar.

The next few paragraphs basically say "the tool run arbitrary programs on your machine, pull in arbitrary files, and use that to run more arbitrary commands" and then blames you for thinking that is a bad sequence of events.

In the best possible light I (an AI-neutral reader) can paint this rant on a hosting-company blog (why publish this?) is that 1) allowing random textbots to execute programs on your work computer is good (disagree), 2) those chatbots do, in fact, occasionally say enough correct-ish things that they are probably worth your company paying $20+/month for your access (agree).

  • I think tptacek is generally worth reading. He is one of the users with the highest karma on this site (https://news.ycombinator.com/user?id=tptacek)

    I’m happy to have read this, which is reason enough to publish it - but also it’s clearly generating debate so it seems like a very good thing to have published.

    • I generally like tptacek, but this piece has major flaws. His dismissal of all copyright concerns on the grounds that some developers engage in piracy is horribly lazy.

    • High karma in an internet community is not something I respect automatically. Pewdiepie and every other little online personality have tons of followers and likes.

      tptacek has always come across arrogant, juvenile, opinionated, and difficult to work with.

    • co-mingling your work identity with your gopher-identity is a dangerous game; I should clarify the gopher is not UMN-related.

  • 1 is not a requirement for using agents. You give agents access to a specific workspace and set of tools that you know are safe, similar to how you give the junior dev access to scratch but not prod. Feels like this comment is not in good faith if I’m being honest.

    • According to the article you are not a serious user of LLMs and are as out of date as the curmudgeon straw men being attacked.

      There are zero "safe" tools where you don't control the inputs.

I don't think I would notice a 100% improvement in software productivity in most companies, from the outside. Most of the time, that would just translate to the company being able to hire fewer developers, and having slightly higher profit margins – but not enormously higher, because developers are only one part.

I recently used Claude Code to develop & merge an optimization that will save about $4,000 a month. It was relatively simple but tedious, so I probably wouldn't have done it on my own. I don't even expect most of my coworkers to notice.

If you read post, the article is mostly agreeing with you. What they're pointing out is not "the AI can do everything you do", it's that "an AI coder can do a lot of the boring typing a lot faster than you, leaving you right at the point of 'real implementation'".

Having something else write a lot of the boring code that you'll need and then you finish up the final touches, that's amazing and a huge accelerator (so they claim).

The claim is not "AI will replace us all", the claim of the parent article is "AI is a big deal and will change how we work, the same way IDEs/copy-paste/autocomplete/online documentation have radically changed our work."

  • > an AI coder can do a lot of the boring typing a lot faster than you, leaving you right at the point of 'real implementation

    Vim and bash solved that for me a long time ago in a more reliable and efficient way (and it's certainly not the only tool capable of that).

    > the same way IDEs/copy-paste/autocomplete/online documentation have radically changed our work

    I was there before and went in the autocomplete/lsp thing pretty late (because Vim didn't have good lsp support for a long time, and Vim without it was still making me more efficient than any other IDE with it). Those things didn't radically change our work as you claim, it just made us a bit more productive.

Who said "unlimited powers", other than you? That seems like an intentional distortion of the article, which says "LLMs can write a large fraction of all the tedious code you’ll ever need to write". Doing most of my tedious work is very different than "unlimited powers".

I use AI to chew through tedious work all the time. In fact, I let an agent do some work just before I checked HN to read your claim that it can't do that. Everyone at my job does the same, perhaps modulo checking HN. But there's no 'unlimited power' to show you - we're just about 30% faster than we used to be.

  • You might be 30% faster, but not everyone is doing your job specific with your exact skill set. I suspect the range is like -5% to 50%, the question nobody can answer is what the average is because things are so diverse and nobody is collecting good data.

    In decades of programming I’ve written very little tedious code, but that’s as much about the projects I’ve worked on as approach I use.

I think this is a misunderstanding coder productivity. A 10x engineer isn't 10x faster at popping out Unit tests, that stuff is mind-numbingly boring that turns out a next token predictor can do it with ease. In fact I would guess that really "productive" software engineers, slow down considerably when forced to do this important but slow work*.

The 10x engineer is really good at deducing the next most important thing to do is and doing it quickly. This involves quickly moving past 100's of design decisions in a week to deliver something quickly. It requires you to think partly like a product manager and partly like a senior engineer but that's the game and LLM's are zero help there.

Most engineering productivity is probably locked up in this. So yes, LLM's probably help a lot, just not in the way that would show on some Jira board?

*One could claim that doing this slow work gives the brain a break to then be good at strategizing the higher order more important work. Not sure.

>I’d love to see the authors of effusive praise of generative AI like this

He spent a large tranche of the article specifically hanging a lantern on how mediocre the output is.

>by creating an AI only company

He specifically says that you need to review the code over and over and over.

> If GAI (or agents, or whatever comes next …) is so effective it should be quite simple to prove that by creating an AI only company and in short order producing huge amounts of serviceable code to do useful things.

I don't think this follows. Anyone can see that 10-ton excavator is hundreds or even thousands of times more efficient than a man with a shovel. That doesn't mean you can start a company up staffed only with excavators. Firstly you obviously need people operating the excavator. Secondly the excavator is incredibly efficient at moving lots of dirt around, but no crew could perform any non-trivial job without all the tasks that the excavator is not good out - planning, loading/unloading, prepping the site, fine work (shovelling dirt around pipes and wires), etc.

AI is a tool. It will mean companies can run much leaner. This doesn't imply they can do everything a company needs to do.

I build tools all the time using AI tools that would have taken me 10x longer before. A couple examples below. The first example was a literal one shot of my dumping the full page html into cursor and asking it for the feature I needed. In using it today it already saved me over an hour of analysis time across the refunds I analyze. The second example includes using AI for better classifying text and having our support people better use their time to help customers. I could have probably made the second thing without LLMs but I didn’t have the skills. Here I was able to provide concrete value to our company and our customers.

https://github.com/Atlas-Authority/mpac-ui-improved https://moduscreate.com/blog/forum-monitoring-is-essential-b... (Pardon how marketing keyword stuffed the final post)

> The time has come for llm users to put up or shut up - if it’s so great, stop telling us and show and use the code it generated on its own.

I’m open to that happening. I mean them showing me. I’m less open to the Nth “aww shucks, the very few doubters that are left at this point are about to get a rude awakening” FOMO concern trolling. I mean I guess it’s nice for me that you are so concerned about my well-being, soon to be suffering-being?

Now, AI can do a lot of things. Don’t get me wrong. It has probably written a million variations on the above sentiment.

A "eulogy" is a speech you make at a funeral in honor of the dead person. I think you meant "apology".

  • While that is the most common sense of eulogy, it's not the only one. A eulogy is also any speech that highly praises someone or something - which is most commonly done at funerals, which is how the funeral association came about (also probably by association with an elegy, which is an etymologically unrelated word that refers to a Greek poem dedicated to someone who passed away).

    In many romance languages, eulogy doesn't have the funeral connotation, only the high praise one - so the GP may be a native speaker of a romance language who didn't realize this meaning is less common in English.

  • Yes I think I was thinking more a paean or apology though not sure apology is used in that sense much nowadays - perhaps apologia is clearer. In praise of would be better, thanks will edit just now.

Honestly it’s really unfortunate that LLMs seem to have picked up the same hype men that attached themselves to blockchains etc.

LLMs are very useful. I use them as a better way to search the web, generate some code that I know I can debug but don’t want to write and as a way to conversationally interact with data.

The problem is the hype machine has set expectations so high and refused criticism to the point where LLMs can’t possibly measure up. This creates the divide we see here.

  • I think LLM hype is more deserved and different from that of blockchain.

    There's still a significant barrier to entry to get involved with blockchain and most people don't even know what it is.

    LLMs on the other hand have very low barrier to at least use- one can just go to google, ChatGPT etc and use it and see its effectiveness. There's a reason why in the last year, a significant portion of school students are now using LLMs to cheat. Blockchains still don't have that kind of utilization.

    • I agree with all of these points.

      Honestly I think that makes the argument stronger though that it’s unfortunate they jumped on.

  • I think I agree with the general thrust but I have to say I've yet to be impressed with LLMs for web search. I think part of that comes from most people using Google as the benchmark, which has been hot garbage for years now. It's not hard to be better than having to dig 3 sponsored results deep to get started parsing the list of SEO spam, let alone the thing you were actually searching for.

    But compared to using Kagi, I've found found LLMs end up wasting more of my time by returning a superficial survey with frequent oversights and mistakes. At the final tally I've still found it faster to just do it myself.

    I will say I do love LLMs for getting a better idea of what to search for, and for picking details out of larger blocks.

    • I find search engines like Google and Bing are so overly keen on displaying any results that they'll ignore your search parameters and return something else instead.

      Thus, I find LLMs quite useful when trying to find info on niches that are close to a very popular topic, but different in some key way that's hard to express in search terms that won't get ignored.

    • > I think part of that comes from most people using Google as the benchmark, which has been hot garbage for years now.

      Honestly, I think part of the decline of Google Search is because it's trying to increase the amount of AI in search.

      1 reply →

  • There's not much riding on convincing the broader public that AI is the real deal before it's proved itself beyond the shadow of any doubt. There's nothing they can do to prepare at this point.

This has all the energy of people saying "ah, you take such great photos, you must have a good camera"

_People_ are getting outsized value from AI in the ways they apply it. Photographs come from the photographer, not the camera.

I suspect it comes down to use case. Those you refer to use it for well-trodden areas; likely web UIs and servers using popular tools. The image they predict is extrapolated from these.

The author's central argument seems to be that the current state of LLM development is such that 1 Senior + LLM === 1 Senior + 4 juniors

With that as a metric, 1 Senior + 4 juniors cannot build the company with the scope you are describing.

A 50-eng company might have 1 CTO, 5 staff, 15 Seniors, and 29 juniors. So the proposition is you could cut the company in ~half but would still require the most-expensive aspects of running a company.

  • Even if you have one trillion juniors, you're not going to get them to spit out 1000 lines of code in a matter of seconds, after you give them an assignment, like LLMs do, so this is a faulty comparison.

    Also often it takes a senior dev _more_ time to _explain_ to a junior what needs to be done than it takes to do it himself. What LLMs give us is the ability to generate a feature about as fast as we can type up the instructions we would have, pre-AI, given to a junior dev.

  • > The author's central argument seems to be that the current state of LLM development is such that 1 Senior + LLM === 1 Senior + 4 juniors

    This is such an outlandish claim, to the point where I call it plain bullshit.

    LLMs are useful in a completely different way that a Junior developer is. It is an apples and oranges comparison.

    LLMs does things in some way that it helps me beyong what a Junior would. It also is completely useless to perform many tasks that a Junior developer can.

    • I disagree that they are so different you can't compare them.

      Imagine a senior IC staffed with 4 juniors, and they spend 2 hours with each every day. Then the junior is left with 6 hours to think through what they were taught/told. This is very similar to LLM development except instead of context switching 3 times each day, the senior can skip over the 6 hours of independent time the junior required to absorb the changes. But it still takes the same amount of time to deliver the 4 separate projects.

      I find the existence of LLM development deeply troubling for a long list of reasons. But refuting the claim that an LLM is similar in many ways to a junior dev is unsubstantiated

      >It also is completely useless to perform many tasks that a Junior developer can.

      And there are many things one junior could be helpful with that a different junior would be useless at.

      5 replies →

> The time has come for llm users to put up or shut up - if it’s so great, stop telling us and show and use the code it generated on its own.

Why would we do this? Wouldn’t it be better to do this silently and reap the benefits?

I don't know if you are the same (S.G.) greyarea I'm familiar with but I hope so because the idea of having a couple of 90s era irc people take opposing viewpoints on LLMs in 2025 amuses me.

That's a strawman.

Nobody is saying it's "unlimited powers", that's your exaggeration.

And what you're proposing about an "AI only company" seems to be based on your misunderstanding.

What this article is saying is, you need the same number of senior developers, but now each one is essentially assisted by a few junior developers virtually for free.

That's huge. But saying you want to see an "AI only company" as "proof" has nothing to do with that.

And what you're describing -- "occasionally useful for exploring apis, odd bits of knowledge etc, but overall wasted more time than it has saved" -- is exactly what the author explicitly addresses at the top:

> If you were trying and failing to use an LLM for code 6 months ago, you’re not doing what most serious LLM-assisted coders are doing. People coding with LLMs today use agents...

The entire article is about how to use LLM's effectively. What kind of "proof" do you really want, when the article explains it all awfully clearly?

From what I've seen, it takes a deeply original thinker to get something great out of generative AI. But if you're a deeply original thinker, you're probably exceptional regardless of whether you are using AI or not.

Generative AI is too much of a blank canvas at the moment, and one that is always shifting. It's up to the user to find all the use cases, and even then in my experience it's just as likely to send me on a wild goose chase as it is to instantly solve my problem.

What kind of proof are you looking for here, exactly? Lots of businesses are successfully using AI... There are many anecdotes of this, which you can read here, or even in the article you commented on.

What else are you looking for?

  • What do you mean by “successfully using AI”, do you just mean some employee used it and found it helpful at some stage of their dev process, e.g. in lieu of search engines or existing codegen tooling?

    Are there any examples of businesses deploying production-ready, nontrivial code changes without a human spending a comparable (or much greater) amount of time as they’d have needed to with the existing SOTA dev tooling outside of LLMs?

    That’s my interpretation of the question at hand. In my experience, LLMs have been very useful for developers who don’t know where to start on a particular task, or need to generate some trivial boilerplate code. But on nearly every occasion of the former, the code/scripts need to be heavily audited and revised by an experienced engineer before it’s ready to deploy for real.

    • Yeah, I should have posted the first version of my post, pointing out that the problem with this demand for proof (as is often the case) devolves into boring definitional questions.

      I don't understand why you think "the code needs to be audited and revised" is a failure.

      Nothing in the OP relies on it being possible for LLMs to build and deploy software unsupervised. It really seems like a non sequitur to me, to ask for proof of this.

      2 replies →

  • I'd like to see any actual case studies. So far I have only heard vague hype.

    • Software engineering is always light on case studies though, for instance test driven development, or static vs. dynamic typing, people have been debating these for quite a long time.

    • i mean i can state that i built a company wihtin the last year where id say 95% of my code involved using an LLM. I am an experienced dev so yes it makes mistakes and it requires my expertise to be sure the code works and to fix subtle bugs; however, i built this company me and 2 others in about 7 months for what wouldve easily taken me 3 years without the aid of LLMs. Is that an indictment of my ability? maybe, but we are doing quite well for ourselves at 3M arr already on only 200k expense.

      3 replies →

Did you even glance at the link? The author is advocating for a human-supervised LLM agent workflow.

Maybe they're just busy shipping code instead of proving stuff to you, which they stand nothing to gain from?

You set up a strawman (AI only companies, agents doing everything on their own) which is irrelevant to the point the article is making. One excerpt:

> Almost nothing it spits out for me merges without edits. I’m sure there’s a skill to getting a SOTA model to one-shot a feature-plus-merge! But I don’t care. I like moving the code around and chuckling to myself while I delete all the stupid comments. I have to read the code line-by-line anyways.

I think this article is very on point, I relate with basically every paragraph. It's not a panacea, it's not a 10x improvement by any means, but it's a very meaningful improvement to both productivity (less than 2x I'd say, which would already be a ton) and fun for me. As I've mentioned in the past here

> I feel like there’s also a meaningful split of software engineers into those who primarily enjoy the process of crafting code itself, and those that primarily enjoy building stuff, treating the code more as a means to an end (even if they enjoy the process of writing code!). The former will likely not have fun with AI, and will likely be increasingly less happy with how all of this evolves over time. The latter I expect are and will mostly be elated.

which is a point the article makes too (tables), in a slightly different way.

Also, to be clear, I agree that 90% of the marketing around AI is overblown BS. But that's again beside the point, and the article is making no outlandish claims of that kind.

Overall, I hope this article (as intended) will make more people lose their dismissiveness and wake up their curiosity, as I expect the future of those is akin to that of people today saying they're "not really good at computers". It's a paradigm-shift, and it takes getting used to and productive in, as some imo smart people are mentioning even in this thread[0].

[0]: https://news.ycombinator.com/item?id=44164039

  • > Maybe they're just busy shipping code instead of proving stuff to you, which they stand nothing to gain from?

    Well, in this case they’re busy writing articles trying to convince us, instead of proving stuff to us.

    • Point taken. I’d still argue though that writing an article like this is at least an order of magnitude easier than proving a, say, 30% productivity boost to outside observers.

the greatest proof that it's a useful tool is how much people are using it. Almost everyone is using it. If you're using it then you probably find it useful as well.

  • a) If “everyone is using it”, show me the numbers. Doing a developer survey to see how many are using it is not hard. Ask a random sample of developers if they ⅰ use LLMs to aid their standard coding or ⅱ completely replace their coding with LLM output.

    b) Even if “everyone is using it” it doesn’t mean it is useful. The usage could be adequately explained by e.g. marketing, being forced on them by management/policy, etc. Not everything with high usage is useful. I can e.g. quickly think of chewing gum (which is also used by a lot of developers), or the ANSI standard keyboard (as opposed to the ISO standard keyboard).

  • And the greatest proof that people are using software is that people are complaining about it.

You think that the only code that is valuable is code that is written by a professional SWE.

There are LOADS of people who need "a program" but aren't equipped to write code or hire an SWE that are empowered by this. And example: last week, I saw a PM vibe code several different applications to demo what might get built after it gets prioritized by SWEs

  • I can share my experience I am not a software engineer. I am the "other" type of Engineer (chemical engineer). I tend to use AI in the same way I use google or Stack Overflow, mostly when I get stuck on syntax and my code editor isn't helping me, I know what I am attempting to do "should" be possible but I can't get the syntax right. For example a recent question I asked was

    "Python, create an xarray with two dimensions from a pandas df"

    It gave me a few lines of example code which was enough for me to figure out where I had messed up the syntax in my own code.

    I have seen one of my junior coworkers copy+paste entire chunks of code from chatbot conversations and to be honest what he has produced is underwhelming the code is poorly structured difficult to reason about I have low confidence he understands what the bot has produced (and why it did things the way it did) and I don't have high confidence we'd be able to trust the accuracy of the figures this code was outputting.

  • Not really I’m fine with anyone knocking stuff together but I think people should be aware of the limitations and dangers. Writing like this does nothing to inform and is overly positive IMO.

    It’d be like insisting llms will replace authors of novels. In some sense they could but there are serious shortcomings and things like agents etc just don’t fix them.