2025: The Year in LLMs

1 month ago (simonwillison.net)

642 comments

simonw

All these improvement in a single year, 2025. While this may seem obvious to those who follows along the AI / LLM news. It may be worth pointing out again ChatGPT was introduced to us in November 2022.

I still dont believe AGI, ASI or Whatever AI will take over human in short period of time say 10 - 20 years. But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of. People are willing to pay $200 per month, and it is getting $1B dollar runway already.

Being more of a Hardware person, the most interesting part to me is the funding of all the developments of latest hardware. I know this is another topic HN hate because of the DRAM and NAND pricing issue. But it is exciting to see this from a long term view where the pricing are short term pain. Right now the industry is asking, we have together over a trillion dollar to spend on Capex over the next few years and will even borrow more if it needs to be, when can you ship us 16A / 14A / 10A and 8A or 5A, LPDDR6, Higher Capacity DRAM at lower power usage, better packaging, higher speed PCIe or a jump to optical interconnect? Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.

I so wished I brought some Nvidia stock. Again, I guess no one knew AI would be as big as it is today, and it is only just started.

wpietri 1 month ago
This is not a great argument:
> But it is hard to argue against the value of current AI [...] it is getting $1B dollar runway already.
The psychic services industry makes over $2 billion a year in the US [1], with about a quarter of the population being actual believers. [2].
[1] The https://www.ibisworld.com/united-states/industry/psychic-ser...
[2] https://news.gallup.com/poll/692738/paranormal-phenomena-met...
- apexalpha 1 month ago
  
  What if these provide actual value through placebo-effect?
  
  6 replies →
- ctoth 1 month ago
  
  2022/2023: "It hallucinates, it's a toy, it's useless."
  2024/2025: "Okay, it works, but it produces security vulnerabilities and makes junior devs lazy."
  2026 (Current): "It is literally the same thing as a psychic scam."
  Can we at least make predictions for 2027? What shall the cope be then! Lemme go ask my psychic.
  
  31 replies →
jillesvangurp 1 month ago
2025 was the year of development tool using AI agents. I think we'll shift attention to non development tool using AI agents. Most business users are still stuck using chat gpt as some kind of grand oracle that will write their email or powerpoint slides. There are bits and pieces of mostly technology demo level solutions but nothing that is widely used like AI coding tools are so far. I don't think this is bottle necked on model quality.
I don't need an AGI. I do need a secretary type agent that deals with all the simple but yet laborious non technical tasks that keep infringing on my quality engineering time. I'm CTO for a small startup and the amount of non technical bullshit that I need to deal with is enormous. Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action; Customer offers, price calculations, scraping invoices from emails and online SAAS accounts, formulating detailed replies to customer requests, HR legal work, corporate bureaucracy, financial planning, etc.
A lot of this stuff can be AI assisted (and we get a lot of value out of ai tools for this) but context engineering is taking up a non trivial amount of my time. Also most tools are completely useless at modifying structured documents. Refactoring a big code base, no problem. Adding structured text to an existing structured document, hardest thing ever. The state of the art here is an ff-ing sidebar that will suggest you a markdown formatted text that you might copy/paste. Tool quality is very primitive. And then you find yourself just stripping all formatting and reformatting it manually. Because the tools really suck at this.
- arcatech 1 month ago
  
  > Some examples of random crap I deal with: figuring out contracts, their meaning/implication to situations, and deciding on a course of action
  This doesn’t sound like bullshit you should hand off to an AI. It sounds like stuff you would care about.
  
  3 replies →
- topaztee 1 month ago
  
  `Also most tools are completely useless at modifying structured documents`
  we built a tool for this for the life science space and are opening it up to the general public very soon. Email me I can give you access (topaz at vespper dot com)
- jennyholzer3 1 month ago
  
  you don't need AGI, you need human labor
utopiah 1 month ago
> All these improvement in a single year
> hard to argue against the value of current AI
> People are willing to pay $200 per month, and it is getting $1B dollar runway already.
Those are 3 different things. There can be a LOT of fast and significant improvements but still remain extremely far from the actual goal, so far it looks like actually little progress.
People pay for a lot of things, including snake oil, so convincing a lot of people to pay a bit is not in itself a proof of value, especially when some people are basically coerced into this, see how many companies changed their "strategy" to mandating AI usage internally, or integration for a captive audience e.g. Copilot.
Finally yes, $1B is a LOT of money for you and I... but for the largest corporations it's actually not a lot. For reference Google earned that in revenue... per day in 2023. Anyway that's still a big number BUT it still has to be compared with, well how much does OpenAI burn. I don't have any public number on that but I believe the consensus is that it's a lot. So until we know that number we can't talk about an actual runway.
- aspenmartin 1 month ago
  
  > People pay for a lot of things, including snake oil, so convincing a lot of people to pay a bit is not in itself a proof of value
  But do you really believe e.g. Claude code is snake oil? I pay $200 / month for Claude, which is something I would have thought monumentally insane maybe 1-2 years ago (e.g. when ChatGPT came out with their premium subscription price I thought that seemed so out of touch). I don't think we would be seeing the subscription rates and the retention numbers if it really was snake oil.
  > Finally yes, $1B is a LOT of money for you and I... but for the largest corporations it's actually not a lot. For reference Google earned that in revenue... per day in 2023. Anyway that's still a big number BUT it still has to be compared with, well how much does OpenAI burn. I don't have any public number on that but I believe the consensus is that it's a lot. So until we know that number we can't talk about an actual runway.
  this gets brought up a lot but I'm not sure I understand why folks on a forum called YCombinator, a startup accelerator, would make this sound like an obvious sign of charlatanism; operating at a loss is nothing new and anthropic / openAI strategy seems perfectly rational: they are scaling and capturing market share, and TAM is insane.
- jimmaswell 1 month ago
  
  > many companies changed their "strategy" to mandating AI usage internally
  Are they hiring? My job is still dragging its feet on approving copilot.
pjc50 1 month ago
Investing a trillion dollars for a revenue of a billion dollars doesn't sound great yet.
- signatoremo 1 month ago
  
  Companies benefiting from trillion dollars spent during the dotcom era certainly make more than a billion dollars, for the last 20 years.
  Intellectual dishonesty is certainly rampant on HN.
- steveBK123 1 month ago
  
  Indeed, its the old Uber playbook at nearly two extra orders of magnitude.
  It is a large enough number to simply run out of private capital to consume before it turns cash flow positive.
  Lots of things sell well if sold at such a loss. I’d take a new Ferrari for $2500 if it was on offer.
  
  8 replies →
coffeebeqn 1 month ago
Seems like Nvidia will be focusing on the super beefy GPUs and leaving the consumer market to a smaller player
- Flow 1 month ago
  
  I don't get why Nvidia can't do both? Is it because of the limited production capabilities of the factories?
  
  2 replies →
- _s 1 month ago
  
  AMD owns a lot of the consumer market already; handhelds, consoles, desktop rigs and mobile ... they are not a small player.
  
  2 replies →
HumblyTossed 1 month ago
It's a great tool, but right now it's only being used to feed the greed.
>> Again, I guess no one knew AI would be as big as it is today, and it is only just started.
People have been saying similar about self driving cars for years now. "AI" is another one of those expensive ideas that we'll get 85% of the way there and then to get the other 15% will be way more expensive than anyone will want to pay for. It's already happening - HW prices and electricity - people are starting to ask, "if I put more $ into this machine, when am I actually going to start getting money out?" The "true believers" are like, soon! But people are right to be hugely skeptical.
- jliptzin 1 month ago
  
  There are some things it's really great at. For example, handling a css layout. If we have to spend trillions of dollars and get nothing else out of it other than being able to vertically center a <div> without wrestling with css and wanting to smash the keyboard in the process, it will all have been worth it.
  
  1 reply →
- aspenmartin 1 month ago
  
  I agree -- skepticism is totally healthy. And there are so many great ways to poke holes in the true underlying narratives (not the headlines that people seem to pull from). E.g. evaluation science is a wasteland (not for wont of very smart people trying very hard to get them right). How do we tackle the power requirements in a way that is sustainable? Etc. etc.
  But stuff like this im not sure I understand:
  > It's a great tool, but right now it's only being used to feed the greed.
  if its a great tool, then how is it _only_ being used to "feed the greed" and what do you mean by that?
  Also I think folks are quick to make analogies to other points in history: "AI is like the dot com boom we're going to crash and burn" and "AI is like {self driving cars, crypto, etc} and the promises will all be broken, its all hype" but this removes the nuance: all of these things are extremely different with very specific dynamics that in _some_ ways may be similar but in many crucial and important ways are completely different.
  
  3 replies →
layer8 1 month ago

> Every single part of the hardware stack are being fused with money and demand. The last time we have this was Post-PC / Smartphone era which drove the hardware industry forward for 10 - 15 years. The current AI can at least push hardware for another 5 - 6 years while pulling forward tech that was initially 8 - 10 years away.
It’s very unclear how much end-consumer hardware and DIY builders will benefit from that, as opposed to server-grade hardware that only makes sense for the enterprise marker. It could have the opposite effect, like hardware manufacturers leaving the consumer market (as in the case of Micron), because there’s just not that much money in it.
chias 1 month ago
These are not all improvements. Listed:
* The year of YOLO and the Normalization of Deviance
* The year that Llama lost its way
* The year of alarmingly AI-enabled browsers
* The year of the lethal trifecta
* The year of slop
* The year that data centers got extremely unpopular
- Y_Y 1 month ago
  
  Not that YOLO, PJ Reddie released that in 2015
- mbesto 1 month ago
  
  Said differently - the year we start to see all of the externalities of a globally scaled hyped tech trend.
- steveBK123 1 month ago
  
  > * The year that data centers got extremely unpopular
  I was discussing the political angle with a friend recently. I think Big Tech Bro / VC complex has done themselves a big disservice by aligning so tightly with MAGA to the point AI will be a political issue in 2026 & 2028.
  Think about the message they’ve inadvertently created themselves - AI is going to replace jobs, it’s pushing electric prices up, we need the government to bail us out AND give us a regulatory light touch.
  Super easy campaign for Dems - big tech trumpers are taking your money, your jobs, causing inflation, and now they want bailouts !!
Atomic_Torrfisk 1 month ago

> People are willing to pay $200 per month
Some people are of course, but how many?
> ... People are willing to pay $200 per month
This is just low-key hype. Careful with your portfolio...
ACCount37 1 month ago

Is the AI progress in 2025 an outstanding breakthrough? Not really. It's impressive but incremental.
Still, the gap between the capabilities of a cutting edge LLM and that of a human is only this wide. There are only this many increments it takes to cross it.
belter 1 month ago
>> But it is hard to argue against the value of current AI, which many of the vocal critics on HN seems to have the opinion of.
What is the concrete business case? Can anyone point to a revenue producing company using AI in production, and where AI is a material driver of profits?
Tool vendors don’t count. I’m not interested in how much money is being made selling shovels...show me a miner who actually struck gold please.
- tim333 1 month ago
  
  A lot of programmers seem willing to pay for the likes of Claude Code, presumably because it helps them get more done. Programmers cost money so that's a potential cost saving?
tstrimple 1 month ago
[flagged]
- cherryteastain 1 month ago
  
  Sam Altman [1] certainly seems to talk about AGI quite a bit
  [1] https://blog.samaltman.com/reflections
- ACCount37 1 month ago
  
  Honestly, I wouldn't be surprised if a system that's an LLM at its core can attain AGI. With nothing but incremental advances in architecture, scaffolding, training and raw scale.
  Mostly the training. I put less and less weight on "LLMs are fundamentally flawed" and more and more of it on "you're training them wrong". Too many "fundamental limitations" of LLMs are ones you can move the needle on with better training alone.
  The foundation of LLM is flexible and capable, and the list of "capabilities that are exclusive to human mind" is ever shrinking.
  
  13 replies →

andai 1 month ago

Re: yolo mode

I looked into docker and then realized the problem I'm actually trying to solve was solved in like 1970 with users and permissions.

I just made a agent user limited to its own home folder, and added my user to its group. Then I run Claude code etc as the agent user.

So it can only read write /home/agent, and it cannot read or write my files.

I add myself to agent group so I can read/write the agent files.

I run into permission issues sometimes but, it's pretty smooth for the most part.

Oh also I gave it root to a $3 VPS. It's so nice having a sysadmin! :) That part definitely feels a bit deviant though!

staeff777 1 month ago

I really like this idea and just tried some steps for myself. create user with homedir: sudo useradd -m agent add myself to agent group: sudo usermod -a -G agent $USER
Allow agent group to agent home dir: sudo chmod -R 770 /home/agent
Start a new shell with the group (or login/logoff): newgrp agent Now you should be able to change into the agent home.
Allow your user to sudo as agent: echo "$USER ALL=(agent) NOPASSWD: ALL" |sudo tee -a /etc/sudoers.d/$USER-as-agent now you can start your agent using sudo: sudo -u agent your_agent
works nice.
jillesvangurp 1 month ago

I use a qemu vm for running codex cli in yolo mode and use simple ssh based git operations for getting code in and out of there. Works great. And you can also do fun things like let it loose on multiple git projects in one prompt. The vm can run docker as well which helps with containerized tests and other more complicated things. One thing I've started to observe is that you spend more time waiting for tool execution than for model inference. So having a fast local vm is better than a slower remote one.
andai 1 month ago

Re: yolo mode
https://markdownpastebin.com/?id=1ef97add6ba9404b900929ee195...
My notes from back when I set this up! Includes instructions for using a GUI file explorer as the agent user. As well as setting up a systemd service to fix the permissions automatically.
(And a nice trick which shows you which GUI apps are running as which user...)
However, most of these are just workarounds for the permission issue I kept running into, which is that Claude Code would for some reason create files with incorrect permissions so that I couldn't read or write those files from my normal account.
If someone knows how to fix that, or if someone at Anthropic is reading, then most of this Rube Goldberg machine becomes unnecessary :)
some_developer 1 month ago

Docker in docker, with opencode.
Opencode plus some scripts on host and in its container works well to run yolo and only see what it needs (via mounting). Has git tools but can't push etc. is thought how to run tests with the special container-in-container setup.
Including pre-configured MCPs, skills, etc.
The best part is that it just works for everyone on the team, big plus.
knicholes 1 month ago

cgroups and namespaces

ogou 1 month ago

This is a good tooling survey of the past year. I have been watching it as a developer re-entering the job market. The job descriptions closely parallel the timeline used in the post. That's bizarre to me because these approaches are changing so fast. I see jobs for "Skill and Langchain experts with production-grade 0>1 experience. Former founders preferred". That is an expertise that is just a few months old and startups are trying to build whole teams overnight with it. I'm sure January and February will have job postings for whatever gets released that week. It's all so many sand castles.

weatherlite 1 month ago
> Skill and Langchain experts with production-grade 0>1 experience.
Also , it's just normal backend work - calling a bunch of APIs. What am I missing here?
- XenophileJKO 1 month ago
  
  That is like saying training tensorflow models is just calling some APIs.
  Actually making a system like this work seems easy, but isn't really.
  (Though with the CURRENT generation or two of models it has gotten "pretty easy" I think. Before that, not so much.)
  
  4 replies →
- walthamstow 1 month ago
  
  Buzzwords.
- jennyholzer3 1 month ago
  
  LLM addicts are cult members. Proficiency with buzz words is used to demonstrate status within the cult.

waldrews 1 month ago

Remember, back in the day, when a year of progress was like, oh, they voted to add some syntactic sugar to Java...

nrhrjrjrjtntbt 1 month ago
More like 6 different new nosql databases and js frameworks.
- dotancohen 1 month ago
  
  A Wordpress zero day and Linux not on the desktop. Netcraft confirms it.
crystal_revenge 1 month ago
That must have been a long time back. Having lived through the time when web pages were served through CGI and mobile phones only existed in movies, when SVMs where the new hotness in ML and people would write about how weird NNs were, I feel like I've seen a lot more concrete progress in the last few decades than this year.
This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past. They're cool, but they were way cooler 4 years ago. We've taken big ideas like "agents" and "reinforcement learning" and basically stripped them of all meaning in order to claim progress.
I mean, do you remember Geoffrey Hinton's RBM talk at Google in 2010? [0] That was absolutely insane for anyone keeping up with that field. By the mid-twenty teens RBMs were already outdated. I remember when everyone was implementing flavors of RNNs and LSTMs. Karpathy's character 2015 RNN project was insane [1].
This comment makes me wonder if part of the hype around LLMs is just that a lot of software people simply weren't paying attention to the absolutely mind-blowing progress we've seen in this field for the last 20 years. But even ignoring ML, the world's of web development and mobile application development have gone through incredible progress over the last decade and a half. I remember a time when JavaScript books would have a section warning that you should never use JS for anything critical to the application. Then there's the work in theorem provers over the last decade... If you remember when syntactic sugar was progress, either you remember way further back than I do, or you weren't paying attention to what was happening in the larger computing world.
0. https://www.youtube.com/watch?v=VdIURAu1-aU
1. https://karpathy.github.io/2015/05/21/rnn-effectiveness/
- HarHarVeryFunny 1 month ago
  
  > LLMs are literally technology that can only reproduce the past.
  That's incorrect on many levels. They are drawing upon, and reproducing, language patterns from "the past", but they are combining those patterns in ways that may have never have been seen before. They may not be truly creative, but they are still capable of generating novel outputs.
  > They're cool, but they were way cooler 4 years ago.
  Maybe this year has been more about incremental progress with LLMs than the shock/coolness factor of talking to an LLM for the first time, but the utility of them, especially for programming, has dramatically increased this year, really in the last 6 months.
  The improvement in "AI" image and video generation has also been impressive, to the point now that fake videos on YouTube can often only be identified as such by common sense rather that the fact that they don't look real.
  Incremental improvement can often be more impressive that innovation, whose future importance can be hard to judge when it first appears. How many people read "Attention is all you need" in 2017 and thought "Wow! This is going to change the world!". Not even the authors of the paper thought that.
- handoflixue 1 month ago
  
  > LLMs are literally technology that can only reproduce the past.
  Funny, I've used them to create my own personalized text editor, perfectly tailored to what I actually want. I'm pretty sure that didn't exist before.
  It's wild to me how many people who talk about LLM apparently haven't learned how to use them for even very basic tasks like this! No wonder you think they're not that powerful, if you don't even know basic stuff like this. You really owe it to yourself to try them out.
  
  93 replies →
- waldrews 1 month ago
  
  I'm being hyperbolic of course, but I'm a little dismissive of the progress that happened since the days of BBS's and car based cell phones - we just got more connectivity, more capacity, more content, bigger/faster. Likewise, my attitude toward machine learning before 2023 is a smug 'heh, these computer scientists are doing undisciplined statistics at scale, how nice for them.' Then all of a sudden the machines woke up and started arguing with me, coherently, even about niche topics I have a PhD in. I can appreciate in retrospect how much of the machine learning progress ultimately went into that, but, like fusion, the magic payoff was supposed to be decades away and always remain decades away. This wasn't supposed to happen in my lifetime. 2025 progress isn't the 2023 shock, but this was the year LLM's-as-programmers (and LLM's-as-mathematicians, and...) went from 'isn't that cute, the machine is trying' to 'an expert with enough time would make better choices than the machine did,' and that makes for a different world. More so than, going from a Commodore Vic 20 with 4k of RAM and a modem to the latest Macbook.
- ako 1 month ago
  
  > This year honestly feels quite stagnant. LLMs are literally technology that can only reproduce the past.
  Is this such a big limitation? Most jobs are basically people trained on past knowledge applying it today. No need to generate new knowledge.
  And a lot of new knowledge is just combining 2 things from the past in a new way.
  
  1 reply →
throwup238 1 month ago
> they voted to add some syntactic sugar to Java...
I remember when we just wanted to rewrite everything in Rust.
Those were the simpler times, when crypto bros seemed like the worst venture capitalism could conjure.
- OGEnthusiast 1 month ago
  
  Crypto bros in hindsight were so much less dangerous than AI bros. At least they weren't trying to construct data centers in rural America or prop up artificial stocks like $NVDA.
  
  16 replies →
odiroot 1 month ago
I'm very relieved we've moved away from rewriting everything in Rust.
- jll29 1 month ago
  
  There's no reason not to use Rust for LLM-generated code in the longer term (other than lack of Rust code to learn from in the shorter term).
  The stricter typing of Rust would make sematic errors in generated code come out more quickly than in e.g. Python because using static typing the chances are that some of the semantic errors are also type violations.
  
  1 reply →
- michaelcampbell 1 month ago
  
  Have we though? I'm glad we're not shouting about it from the rooftops like it's some magical "win" button as much, but TBH the things I use routinely that HAVE been rewritten in rust are generally much better. That could also just be because they're newer and have the errors of the past to not repeat.

didip 1 month ago

Indeed. I don't understand why Hacker News is so dismissive about the coming of LLMs, maybe HN readers are going through 5 stages of grief?

But LLM is certainly a game changer, I can see it delivering impact bigger than the internet itself. Both require a lot of investments.

crystal_revenge 1 month ago
> I don't understand why Hacker News is so dismissive about the coming of LLMs
I find LLMs incredibly useful, but if you were following along the last few years the promise was for “exponential progress” with a teaser world destroying super intelligence.
We objectively are not on that path. There is no “coming of LLMs”. We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
- aspenmartin 1 month ago
  
  I'm not sure I understand: we are _objectively on that path_ -- we are increasing exponentially on a number of metrics that may be imperfect but seem to paint a pretty consistent picture. Scaling laws are exponential. METR's time horizon benchmark is exponential. Lots of performance measures are exponential, so why do you say we're objectively not on that path?
  > We might get some incremental improvement, but we’re very clearly seeing sigmoid progress.
  again, if it is "very clear" can you point to some concrete examples to illustrate what you mean?
  > I can’t speak for everyone, but I’m tired of hyperbolic rants that are unquestionably not justified (the nice thing about exponential progress is you don’t need to argue about it)
  OK but what specifically do you have an issue with here?
- viraptor 1 month ago
  
  > exponential progress
  First you need to define what it means. What's the metric? Otherwise it's very much something you can argue about.
  
  12 replies →
- tim333 1 month ago
  
  >following along the last few years the promise was for “exponential progress”
  I've been following for many years and the main exponential thing has been the Moore's law like growth in compute. Compute per dollar is probably the best tracking one and has done a steady doubling every couple of years or so for decades. It's exponential but quite a leisurely exponential.
  The recent hype of the last couple of years is more dot com bubble like and going ahead of trend but will quite likely drop back.
- senordevnyc 1 month ago
  
  I’ve been reading this comment multiple times a week for the last couple years. Constant assertions that we’re starting to hit limits, plateau, etc. But a cursory glance at where we are today vs a year ago, let alone two years ago, makes it wildly obvious that this is bullshit. The pace of improvement of both models and tooling has been breathtaking. I could give a shit whether you think it’s “exponential”, people like you were dismissing all of this years ago, meanwhile I just keep getting more and more productive.
  
  7 replies →
- scotty79 1 month ago
  
  > but we’re very clearly seeing sigmoid progress.
  Yeah, probably. But no chart actually shows it yet. For now we are firmly in exponential zone of the signoid curve and can't really tell if it's going to end in a year, decade or a century.
  
  2 replies →
- fullstackchris 1 month ago
  
  I wrote an article complaining about the whole hype over a year ago:
  https://chrisfrewin.medium.com/why-llms-will-never-be-agi-70...
  Seems to be playing out that way.
- aoeusnth1 1 month ago
  
  We're very clearly seeing exponential progress - even above trend, on METR, whose slope keeps getting revised to a higher and higher estimate each time. Explain your perspective on the objective evidence against exponential progress?
  
  25 replies →
viraptor 1 month ago
Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.
- mirsadm 1 month ago
  
  This has been the argument since day one. You just have to try the latest model, that's where you went wrong. For the record I use Claude Code quite a bit and I can't see much meaningful improvements from the last few models. It is a useful tool but it's shortcomings are very obvious.
- techpression 1 month ago
  
  Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.
  When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
  
  7 replies →
tgv 1 month ago
The negatives outweigh the positives, if only because the positives are so small. A bunch of coders making their lives easier doesn't really matter, but pupils and students skipping education does. As a meme said: you had better start eating healthy, because your future doctor vibed his way through med school.
- biscuit1v9 1 month ago
  
  This. I don't know why this is not upvoted more.
  Education part is on point and as a CS student that sees many of his colleagues using way too much the AI tools for instant homework solving without even processing the answers much.
zvolsky 1 month ago
The idea of HN being dismissive of impactful technology is as old as HN. And indeed, the crowd often appears stuck in the past with hindsight. That said, HN discussions aren't homogeneous, and as demonstrated by Karpathy in his recent blogpost "Auto-grading decade-old Hacker News", at least some commenters have impressive foresight: https://karpathy.bearblog.dev/auto-grade-hn/
- brabel 1 month ago
  
  So exactly 10 years ago a lot of people believed that the game Go would not be “conquered” by AI, but after just a few months it was. People will always be skeptical of new things, even people who are in tech, because many hyped things indeed go nowhere… while it may look obvious in hindsight, it’s really hard to predict what will and what won’t be successful. On the LLM front I personally think it’s extremely foolish to still consider LLMs as going nowhere. There’s a lot more evidence today of the usefulness of LLMs than there was of DeepMind being able to beat top human players in Go 10 years ago.
jcims 1 month ago

It feels like there are several conversations happening that sound the same but are actually quite different.
One of them is whether or not large models are useful and/or becoming more useful over time. (To me, clearly the answer is yes)
The other is whether or not they live up to the hype. (To me, clearly the answer is no)
There are other skirmishes around capability for novelty, their role in the economy, their impact on human cognition, if/when AGI might happen and the overall impact to the largely tech-oriented community on HN.
asielen 1 month ago
It is an over correction because of all the empty promises of LLMs. I use Claude and chatgpt daily at work and am amazed at what they can do and how far they can come.
BUT when I hear my executive team talk and see demos of "Agentforce" and every saas company becoming an AI company promising the world, I have to roll my eyes.
The challenge I have with LLMs is they are great at creating first draft shiny objects and the LLMs themselves over promise. I am handed half baked work created by non technical people that now I have to clean up. And they don't realize how much work it is to take something from a 60% solution to a 100% solution because it was so easy for them to get to the 60%.
Amazing, game changing tools in the right hands but also give people false confidence.
Not that they are not also useful for non-technical people but I have had to spend a ton of time explaining to copywriters on the marketing team that they shouldn't paste their credentials into the chat even if it tells them to and their vibe coded app is a security nightmare.
- semilin 1 month ago
  
  This seems like the right take. The claims of the imminence of AGI are exhausting and to me appear dissonant with reality. I've tried gemini-cli and Claude Code and while they're both genuinely quite impressive, they absolutely suffer from a kind of prototype syndrome. While I could learn to use these tools effectively for large-scale projects, I still at present feel more comfortable writing such things by hand.
  The NVIDIA CEO says people should stop learning to code. Now if LLMs will really end up as reliable as compilers, such that they can write code that's better and faster than I can 99% of the time, then he might be right. As things stand now, that reality seems far-fetched. To claim that they're useless because this reality has not yet been achieved would be silly, but not more silly than claiming programming is a dead art.
hapticmonkey 1 month ago
It’s not the technology I’m dismissive about. It’s the economics.
25 years ago I was optimistic about the internet, web sites, video streaming, online social systems. All of that. Look at what we have now. It was a fun ride until it all ended up “enshitified”. And it will happen to LLMs, too. Fool me once.
Some developer tools might survive in a useful state on subscriptions. But soon enough the whole A.I. economy will centralise into 2 or 3 major players extracting more and more revenue over time until everyone is sick of them. In fact, this process seems to be happening at a pretty high speed.
Once the users are captured, they’ll orient the ad-spend market around themselves. And then they’ll start taking advantage of the advertisers.
I really hope it doesn’t turn out this way. But it’s hard to be optimistic.
- Al-Khwarizmi 1 month ago
  
  Contrary to the case for the internet, there is a way out, however - if local, open-source LLMs get good. I really hope they do, because enshittification does seem unavoidable if we depend on commercial offerings.
  
  1 reply →
phatfish 1 month ago

Maybe because the hype for an next gen search engine that can also just make things up when you query it is a bit much?
probably_wrong 1 month ago
Speaking for myself: because if the hype were to be believed we should have no relational databases when there's MongoDB, no need for dollars when there's cryptocoins, all virtual goods would be exclusively sold as NFTs, and we would be all driving self-driving cars by now.
LLMs are being driven mostly by grifters trying to achieve a monopoly before they run out of cash. Under those conditions I find their promises hard to believe. I'll wait until they either go broke or stop losing money left and right, and whatever is left is probably actually useful.
- simonw 1 month ago
  
  The way I've been handling the deafening hype is to focus exclusively on what the models that we have right now can do.
  You'll note I don't mention AGI or future model releases in my annual roundup at all. The closest I get to that is expressing doubt that the METR chart will continue at the same rate.
  If you focus exclusively on what actually works the LLM space is a whole lot more interesting and less frustrating.
  
  2 replies →
cebert 1 month ago
Many people feel threatened by the rapid advancements in LLMs, fearing that their skills may become obsolete, and in turn act irrationally. To navigate this change effectively, we must keep open minds, keep adaptable, and embrace continuous learning.
- reppap 1 month ago
  
  I'm not threatened by LLMs taking my job as much as they are taking away my sanity. Every time I tell someone no and they come back to me with a "but copilot said.." it's followed by something entirely incorrect it makes me want to autodefenestrate.
  
  1 reply →
- rgoulter 1 month ago
  
  Many comments discussing LLMs involve emotions, sure. :) Including, obviously, comments in favour of LLMs.
  But most discussion I see is vague and without specificity and without nuance.
  Recognising the shortcomings of LLMs makes comments praising LLMs that much more believable; and recognising the benefits of LLMs makes comments criticising LLMs more believable.
  I'd completely believe anyone who says they've found the LLM very helpful at greenfield frontend tasks, and I'd believe someone who found the LLM unable to carry out subtle refactors on an old codebase in a language that's not Python or JavaScript.
- chii 1 month ago
  
  > in turn act irrationally
  it isn't irrational to act in self-interest. If LLM threatens someone's livelihood, it matters not that it helps humanity overall one bit - they will oppose it. I don't blame them. But i also hope that they cannot succeed in opposing it.
  
  3 replies →
- nickphx 1 month ago
  
  rapid advancements in what? hallucinations..? FOMO marketing? certainly nothing productive.
vunderba 1 month ago
> I don't understand why Hacker News is so dismissive about the coming of LLMs.
Eh. I wouldn’t be so quick to speak for the entirety of HN. Several articles related to LLMs easily hit the front page every single day, so clearly there are plenty of HN users upvoting them.
I think you're just reading too much into what is more likely classic HN cynicism and/or fatigue.
- ewoodrich 1 month ago
  
  Exactly. There was a stretch of 6 months or so right after ChatGPT was released where approximately 50% of front page posts at any given time were related to LLMs. And these days every other Show HN is some kind of agentic dev tool and Anthropic/OpenAI announcements routinely get 500+ comments in a matter of hours.
- utopiah 1 month ago
  
  It's because both "side" tries to re-adjust.
  When an "AI skeptic" sees a very positive AI comment, they try to argue that it is indeed interesting but nowhere near close to AI/AGI/ASI or whatever the hype at the moment uses.
  When an "AI optimistic" sees a very negative AI comment, they try to list all the amazing things they have done that they were convinced was until then impossible.
snigsnog 1 month ago
The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level. Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
The most wide-appeal possibility is people loving 100%-AI-slop entertainment like that AI Instagram Reels product. Maybe I'm just too disconnected with normies but I don't see this taking off. Fun as a novelty like those Ring cam vids but I would never spend all day watching AI generated media.
- fragmede 1 month ago
  
  > The internet and smartphones were immediately useful in a million different ways for almost every person. AI is not even close to that level.
  Those are some very rosy glasses you've got on there. The nascent Internet took forever to catch on. It was for weird nerds at universities and it'll never catch on, but here we are.
  
  1 reply →
- raincole 1 month ago
  
  The early internet and smartphones (the Japanese ones, not iPhone) were definitely not "immediately" adopted by the mass, unlike LLM.
  If "immediate" usefulness is the metric we measure, then the internet and smartphones are pretty insignificant inventions compared to LLM.
  (of course it's not a meaningful metric, as there is no clear line between a dumb phone and a smart phone, or a moderately sized language model and a LLM)
  
  5 replies →
- nen-nomad 1 month ago
  
  ChatGPT has roughly 800 million weekly active users. Almost everyone around me uses it daily. I think you are underestimating the adoption.
  
  7 replies →
- JumpCrisscross 1 month ago
  
  > AI is not even close to that level
  Kagi’s Research Assistant is pretty damn useful, particularly when I can have it poll different models. I remember when the first iPhone lacked copy-paste. This feels similar.
  (And I don’t think we’re heading towards AGI.)
- SgtBastard 1 month ago
  
  … the internet was not immediately useful in a million different ways for almost every person.
  Even if you skip ARPAnet, you’re forgetting the Gopher days and even if you jump straight to WWW+email==the internet, you’re forgetting the mosaic days.
  The applications that became useful to the masses emerged a decade+ after the public internet and even then, it took 2+ decades to reach anything approaching saturation.
  Your dismissal is not likely to age well, for similar reasons.
  
  7 replies →
- staticassertion 1 month ago
  
  > Very to somewhat useful in some fields (like programming) but the average person will easily be able to go through their day without using AI.
  I know a lot of "normal" people who have completely replaced their search engine with AI. It's increasingly a staple for people.
  Smartphones were absolutely NOT immediately useful in a million different ways for almost every person, that's total revisionist history. I remember when the iPhone came out, it was AT&T only, it did almost nothing useful. Smartphones were a novelty for quite a while.
  
  1 reply →
- what-the-grump 1 month ago
  
  A year after the iPhone came out… it didn’t have an App Store, barely was able to play video, barely had enough power to last a day. You just don’t remember or were not around for it.
  A year after llms came out… are you kidding me?
  Two years?
  10 years?
  Today, by adding an MCP server to wrap the same API that’s been around forever for some system, makes the users of that system prefer NLI over the gui almost immediately.
Night_Thastus 1 month ago
LLMs hold some real utility. But that real utility is buried under a mountain of fake hype and over-promises to keep shareholder value high.
LLMs have real limitations that aren't going away any time soon - not until we move to a new technology fundamentally different and separate from them - sharing almost nothing in common. There's a lot of 'progress-washing' going on where people claim that these shortfalls will magically disappear if we throw enough data and compute at it when they clearly will not.
- Gigachad 1 month ago
  
  Pretty much. What actually exists is very impressive. But what was promised and marketed has not been delivered.
  
  4 replies →
Madmallard 1 month ago
Have you tried using it for anything actually complicated?
Lol. It's worse than nothing at all.
- lukaslalinsky 1 month ago
  
  I think the split between vibe coding and AI-assisted coding will only widen over time. If you ask LLMs to do something complex, they will fail and you waste your time. If you work with them as a peer, and you delegate tasks to them, they will succeed and you save your time.
  
  2 replies →
Atomic_Torrfisk 1 month ago

> HN readers are going through 5 stages of grief
So we are just irrational and sour?
claudiug 1 month ago

because lies. all the people involved in this, the one a C title, tell us about how great is now.
jheez3 1 month ago
"I can see it delivering impact bigger than the internet itself. Both require a lot of investments."
lol.... Just make sure you screenshot your post so you have a good reminder in a few years re. your predictive ability.
- threethirtytwo 1 month ago
  
  Predicting your own victory instead of defending it is a bold strategy.

syndacks 1 month ago

I can’t get over the range of sentiment on LLMs. HN leans snake oil, X leans “we’re all cooked” —- can it possibly be both? How do other folks make sense of this? I’m not asking for a side, rather understanding the range. Does the range lead you to believe X over Y?

johnfn 1 month ago

I believe the spikiness in response is because AI itself is spiky - it’s incredibly good at some classes of tasks, and remarkably poor at others. People who use it on the spikes are genuinely amazed because of how good it is. This does nothing but annoy the people who use it in the troughs, who become increasingly annoyed that everyone seems to be losing their mind over something that can’t even do (whatever).
coffeefirst 1 month ago
Well, this is the internet. Arguing about everything is its favorite pastime.
But generally yes, I think back to Mongo/Node/metaverse/blockchain/IDEs/tablets and pretty much everything has had its boosters and skeptics, this is just more... intense.
Anyway I've decided to believe my own eyes. The crowds say a lot of things. You can try most of it yourself and see what it can and can't do. I make a point to compare notes with competent people who also spent the time trying things. What's interesting is most of their findings are compatible with mine, including for folks who don't work in tech.
Oh, and one thing is for sure: shoving this technology into every single application imaginable is a good way to lose friends and alienate users.
- jheez3 1 month ago
  
  Only those with great taste are well-equipped to make assertions about what we have infront of us.
  The rest is all noise and personally I just block it out.
  
  1 reply →
PeterHolzwarth 1 month ago
I think it may be all summed up by Roy Amara's observation that "We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run."
- ManuelKiessling 1 month ago
  
  I think this is the most-fitting one-liner right now.
  The arguments going back and forth in these threads are truly a sight to behold. I don’t want to lean to any one side, but in 2025 I‘ve begun to respond to everyone who still argues that LLMs are only plagiarism machines, or are only better autocompletes, or are only good at remixing the past: Yes, correct!
  And CPUs can only move zeros and ones.
  This is likewise a very true statement. But look where having 0s and 1s shuffled around has brought us.
  The ripple effects of a machine doing something very simple and near-meaningless, but doing it at high speed and again and again without getting tired, cannot be underestimated.
  At the same time, here is Nobel Laureate Robert Solow, who famously, and at the time correctly, stated that "You can see the computer age everywhere but in the productivity statistics."
  It took a while, but eventually, his statement became false.
- legulere 1 month ago
  
  The effects might be drastically different from what you would expect though. We’ve seen this with machine learning/AI again and again that what looks probable to work doesn’t work out and unexpected things work.
nstart 1 month ago
The problem with X is that so many people who have no verifiable expertise are super loud in shouting "$INDUSTRY is cooked!!" every time a new model releases. It's exhausting and untrue. The kind of video generation we see might nail realism but if you want to use it to create something meaningful which involves solving a ton of problems and making difficult choices in order to express an idea, you run into the walls of easy work pretty quickly. It's insulting then for professionals to see manga PFPs on X put some slop together and say "movie industry is cooked!". It betrays a lack of understanding of what it takes to make something good and it gives off a vibe of "the loud ones are just trying to force this objectively meh-by-default thing to happen".
The other day there was that dude loudly arguing about some code they wrote/converted even after a woman with significant expertise in the topic pointed out their errors.
Gen AI has its promise. But when you look at the lack of ethics from the industry, the cacophony of voices of non experts screaming "this time it's really doom", and the weariness/wariness that set in during the crypto cycle, it's a natural tendency that people are going to call snake oil.
That said, I think the more accurate representation here is that HN as a whole is calling the hype snake oil. There's very little question anymore about the tools being capable of advanced things. But there is annoyance at proclamations of it being beyond what it really is at the moment which is that it's still at the stage of being an expertise+motivation multiplier for deterministic areas of work. It's not replacing that facet any time soon on its current trend (which could change wildly in 2026). Not until it starts training itself I think. Could be famous last words
- senordevnyc 1 month ago
  
  I’d put more faith in HN’s proclamations if it hadn’t widely been wrong about AI in 2023, 2024, and now 2025. Watching the tone shift here has been fascinating. As the saying goes, the only thing moving faster than AI advances right now is the speed at which HN haters move the goalposts…
  
  3 replies →
zahlman 1 month ago

I'm not really convinced that anywhere leans heavily towards anything; it depends which thread you're in etc.
It's polarizing because it represents a more radical shift in expected workflows. Seeing that range of opinions doesn't really give me a reason to update, no. I'm evaluating based on what makes sense when I hear it.
thisoneisreal 1 month ago

My take (no more informed than anyone else's) is that the range indicates this is a complex phenomenon that people are still making sense of. My suspicion is that something like the following is going on:
1. LLMs can do some truly impressive things, like taking natural language instructions and producing compiling, functional code as output. This experience is what turns some people into cheerleaders.
2. Other engineers see that in real production systems, LLMs lack sufficient background / domain knowledge to effectively iterate. They also still produce output, but it's verbose and essentially missing the point of a desired change.
3. LLMs also can be used by people who are not knowledgeable to "fake it," and produce huge amounts of output that is basically besides-the-point bullshit. This makes those same senior folks very, very resentful, because it wastes a huge amount of their time. This isn't really the fault of the tool, but it's a common way the tool gets used and so it gets tarnished by association.
4. There is a ridiculous amount of complexity in some of these tools and workflows people are trying to invent, some of which is of questionable value. So aside from the tools themselves people are skeptical of the people trying to become thought leaders in this space and the sort of wild hacks they're coming up with.
5. There are real macro questions about whether these tools can be made economical to justify whatever value they do produce, and broader questions about their net impact on society.
6. Last but not least, these tools poke at the edges of "intelligence," the crown jewel of our species and also a big source of status for many people in the engineering community. It's natural that we're a little sensitive about the prospect of anything that might devalue or democratize the concept.
That's my take for what it's worth. It's a complex phenomenon that touches all of these threads, so not only do you see a bunch of different opinions, but the same person might feel bullish about one aspect and bearish about another.
xboxnolifes 1 month ago

From my perspective, both show HN and Twitter's normal biases. I view HN as generally leaning toward "new things suck, nothing ever changes", and I view Twitter generally as "Things suck, and everything is getting worse". Both of those align with snake oil and we're all cooked.
sanderjd 1 month ago

As usual, somewhere in between!
Madmallard 1 month ago

I use them daily and I actively lose progress on complex problems and save time on simple problems.
senordevnyc 1 month ago
Because it turns out that HN is mostly made up of cranky middle-aged conservatives (small c) who have largely defined themselves around coding, and AI is an existential threat to their core identity.
- jennyholzer3 1 month ago
  
  [flagged]
sph 1 month ago
Truth lies in the middle. Yes LLM are an incredible piece of technology, and yes we are cooked because once again technologists and VC have no idea nor interest in understanding the long-term societal ramifications of technology.
Now we are starting to agree that social media has had disastrous effects that have not fully manifested yet, and in the same breath we accept a piece of technology that promises to replace large parts of society with machines controlled by a few megacorps and we collectively shrug with “eh, we’re gonna be alright.” I mean, until recently the stated goal was to literally recreate advanced super-intelligence with the same nonchalance one releases a new JavaScript framework unto the world.
I find it utterly maddening how divorced STEM people have become from philosophical and ethical concerns of their work. I blame academia and the education system for creating this massive blind spot, and it is most apparent in echo chambers like HN that are mostly composed of Western-educated programmers with a degree in computer science. At least on X you get, among the lunatics, people that have read more than just books on algorithms and startups.
- jheez3 1 month ago
  
  "that have not fully manifested yet"
  This is not true..
  "I find it utterly maddening how divorced STEM people have become from philosophical and ethical concerns of their work. I blame academia and the education system for creating this massive blind spot, and it is most apparent in echo chambers like HN that are mostly composed of Western-educated programmers with a degree in computer science. At least on X you get, among the lunatics, people that have read more than just books on algorithms and startups."
  Steve Jobs had something to say about this. Shame hes gone.
llmslave2 1 month ago

Because there is a wide range of what people consider good. If you look at that the people on X consider to be good, it's not very surprising.

mrheosuper 1 month ago

I'm not against AI/LLM(in fact, i am quite supportive to it). But one of my biggest fear is overusing AI. We may introduce some tool that only "AI/LLM" can resonably do(Like tool with weird, convoluted UI/UX, syntax) and no one against it because AI/LLM can use/interact.

Then genAI, It's become more and more difficult to tell which is AI and which is not, and AI is in everywhere. I dont know what to think about it. "If you can't tell, does it matter ?"

netdur 1 month ago

i think the concern about software shifting toward ai design ignores that the web hasn't been human-first for a long time. most traffic is already machine to machine, like crawlers and ci pipelines. we’ve tolerated systems that are barely legible for years. anyone who has grepped through android studio logs knows that human readability is usually a tertiary goal at best. ai interacting with complex systems is just an evolution of the glue code we’ve always written.
as for who made it, utility usually matters more than where it came from. i used an agent for an oss changelog recently and it picked up things i’d forgotten while structuring the narrative better than i could. the intent and code were mine, but the ai acted as a high fidelity compressor. the risk isn't ai being everywhere. it’s the atrophy of judgment where we stop using it to support decisions and start using it to outsource thinking.

AndyNemmity 1 month ago

These are excellent every year, thank you for all the wonderful work you do.

tkgally 1 month ago
Same here. Simon is one of the main reasons I’ve been able to (sort of) keep up with developments in AI.
I look forward to learning from his blog posts and HN comments in the year ahead, too.
- password4321 1 month ago
  
  Don't forget you can pay Simon to keep up with less!
  > At the end of every month I send out a much shorter newsletter to anyone who sponsors me for $10 or more on GitHub
  https://simonwillison.net/about/#monthly
  
  1 reply →

mmcnl 1 month ago

Let's hope 2026 will also have interesting innovations not related to AI or LLMs.

spicyusername 1 month ago

2025 had plenty of those, they just didn't get as many news headlines.
One of the difficult things of modernity is that it's easy to confuse what you hear about a lot with what is real.
One of the great things about modernity is that progress continues, whether we know about it or not.

tantony 1 month ago

Claude Opus 4.5 has been a big step up for me personally, and I used to think Sonnet 3.5 was good. It is an amazing deal at $20.

Just yesterday, it helped me parse out and understand a research paper - complete with step-by-step examples (this one: https://research.nvidia.com/sites/default/files/pubs/2016-03...). I will now go ahead and implement it myself, possibly relegating some of the more grunt-work type tasks to Claude code.

Without it, I would have been struggling through the paper for days, wading through WGSL shader code and there would be a high chance that I just give up on it since this is for a side project and not my $job.

It has been a major force multiplier just for learning things. I have had the $20 subscription for about a year now. I bump it up to the $100 plan if I happen to be working on some project that eats through the $20 allocation. This happens to be one such month. I will probably go back to the $20 plan after this month. I continue to get a lot of value out of it.

the_mitsuhiko 1 month ago

> The (only?) year of MCP

I like to believe, but MCP is quickly turning into an enterprise thing so I think it will stick around for good.

MitziMoto 1 month ago
MCP isn't going anywhere. Some developers can't seem to see past their terminal or dev environment when it comes to MCP. Skills, etc do not replace MCP and MCP is far more than just documentation searching.
MCP is a great way for an LLM to connect to an external system in a standardized way and immediately understand what tools it has available, when and how to use them, what their inputs and outputs are,etc.
For example, we built a custom MCP server for our CRM. Now our voice and chat agents that run on elevenlabs infrastructure can connect to our system with one endpoint, understand what actions it can take, and what information it needs to collect from the user to perform those actions.
I guess this could maybe be done with webhooks or an API spec with a well crafted prompt? Or if eleven labs provided an executable environment with tool calling? But at some point you're just reinventing a lot of the functionality you get for free from MCP, and all major LLMs seem to know how to use MCP already.
- simonw 1 month ago
  
  Yeah, I don't think I was particularly clear in that section.
  I don't think MCP is going to go away, but I do think it's unlikely to ever achieve the level of excitement it had in early 2025 again.
  If you're not building inside a code execution environment it's a very good option for plugging tools into LLMs, especially across different systems that support the same standard.
  But code execution environments are so much more powerful and flexible!
  I expect that once we come up with a robust, inexpensive way to run a little Bash environment - I'm still hoping WebAssembly gets us there - there will be much less reason to use MCP even outside of coding agent setups.
  
  2 replies →
simonw 1 month ago
I think it will stick around, but I don't think it will have another year where it's the hot thing it was back in January through May.
- Alex-Programs 1 month ago
  
  I never quite got what was so "hot" about it. There seems to be an entire parallel ecosystem of corporates that are just begging to turn AI into PowerPoint slides so that they can mould it into a shape that's familiar.
  
  1 reply →
nrhrjrjrjtntbt 1 month ago

MCP or skills? Can a skill negate the need for MCP. In addition there was a YC startup who is looking at searching docs for LLMs or similar. I think MCP may be less needed once you have skills, openapi specs, and other things that LLMs can call directly.
cloudking 1 month ago
For connecting agents to third-party systems I prefer CLI tools, less context bloat and faster. You can define the CLI usage in your agent instructions. If the MCP you're using doesn't exist as a CLI, build one with your agent.
- martinald 1 month ago
  
  Totally agree - wrote this over the holidays which sums it all pretty well https://martinalderson.com/posts/why-im-building-my-own-clis...

timonoko 1 month ago

OpenSCAD-coding has improved significantly on all models. Now syntax is always right and they understand the concept of negative space.

Only problem is that they don't see connection between form and function. They may make teapot perfectly but don't understand that this form is supposed to contain liquid.

fullstackchris 1 month ago

> The reason I think MCP may be a one-year wonder is the stratospheric growth of coding agents. It appears that the best possible tool for any situation is Bash—if your agent can run arbitrary shell commands, it can do anything that can be done by typing commands into a terminal.

I push back strongly from this. In the case of the solo, one-machine coder, this is likely the case - if you're exposing workflows or fixed tools to customers / collegues / the web at large via API or similar, then MCP is still the best way to expose it IMO.

Think about a GitHub or Jira MCP server - commandline alone they are sure to make mistakes with REST requests, API schema etc. With MCP the proper known commands are already baked in. Remember always that LLMs will be better with natural language than code.

simonw 1 month ago
The solution to that is Anthropic's Skills.
Create a folder called skills/how-to-use-jira
Add several Bash scripts with the right curl commands to perform specific actions
Add a SKILL.md file with some instructions in how to use those scripts
You've effectively flattened that MCP server into some Markdown and Bash, only the thing you have now is more flexible (the coding agent can adapt those examples to cover new things you hadn't thought to tell it) and much more context-efficient (it only reads the Markdown the first time you ask it to do something with JIRA).
- aflukasz 1 month ago
  
  But that moves the burden of maintenance from the provider of the service to its users (and/or partially to intermediary in form of "skills registry" of sorts, which apparently is a thing now).
  So maybe a hybrid approach would make more sense? Something like /.well-known/skills/README.md exposed and owned by the providers?
  That is assuming that the whole idea of "skills" makes sense in practice.
  
  1 reply →

rr808 1 month ago

What happened to Devin? 2024 it was a leading contender now it isn't even included in the big list of coding agents.

fullstackchris 1 month ago

Wasn't it basically revealed as a scam? I remember some article about their fancy demo video being sped up / unfairly cut and sliced etc.
simonw 1 month ago

To be honest that's more because I've never tried it myself, so it isn't really on my radar.
I don't hear much buzz about it from the people I pay attention to. I should still give it a go though.
monkeydust 1 month ago

https://cognition.ai/blog/devin-annual-performance-review-20...
ColinEberhardt 1 month ago

It’s still around, and tends to be adopted by big enterprises. It’s generally a decent product, but is facing a lot of equally powerful competition and is very expensive.

zerocool86 1 month ago

The "local models got good, but cloud models got even better" section nails the current paradox. Simon's observation that coding agents need reliable tool calling that local models can't yet deliver is accurate - but it frames the problem purely as a capability gap.

There's a philosophical angle being missed: do we actually want our coding agents making hundreds of tool calls through someone else's infrastructure? The more capable these systems become, the more intimate access they have to our codebases, credentials, and workflows. Every token of context we send to a frontier model is data we've permanently given up control of.

I've been working on something addressing this directly - LocalGhost.ai (https://www.localghost.ai/manifesto) - hardware designed around the premise that "sovereign AI" isn't just about capability parity but about the principle that your AI should be yours. The manifesto articulates why I think this matters beyond the technical arguments.

Simon mentions his next laptop will have 128GB RAM hoping 2026 models close the gap. I'm betting we'll need purpose-built local inference hardware that treats privacy as a first-class constraint, not an afterthought. The YOLO mode section and "normalization of deviance" concerns only strengthen this case - running agents in insecure ways becomes less terrifying when "insecure" means "my local machine" rather than "the cloud plus whoever's listening."

The capability gap will close. The trust gap won't unless we build for it.

apolloartemis 1 month ago

Thank you for your warning about the normalization of deviance. Do you think there will be an AI agent software worm like NotPetya which will cause a lot of economic damage?

simonw 1 month ago

I'm expecting something like a malicious prompt injection which steals API keys and crypto wallets and uses additional tricks to spread itself further.
Or targeted prompt injections - like spear phishing attacks - against people with elevated privileges (think root sysadmins) who are known to be using coding agents.

losvedir 1 month ago

I predict 2026 will be the year of the first AI Agent "worm" (or virus?). Kind of like the Morris worm running amok as an experiment gone wrong, I think we will sometime soon have someone set up an AI agent whose core loop is to try to propagate itself, either as an experiment or just for the lulz.

The actual Agent payload would be very small, likely just a few hundred line harness plus system prompt. It's just a question of whether the agent will be skilled enough to find vulnerabilities to propagate. The interesting thing about an AI worm is that it can use different tricks on different hosts as it explores its own environment.

If a pure agent worm isn't capable enough, I could see someone embedding it on top of a more traditional virus. The normal virus would propagate as usual, but it would also run an agent to explore the system for things to extract or attack, and to find easy additional targets on the same internal network.

A main difference here is that the agents have to call out to a big SotA model somewhere. I imagine the first worm will simply use Opus or ChatGPT with an acquired key, and part of it will be trying to identify (or generate) new keys as it spreads.

Ultimately, I think this worm will be shut down by the model vendor, but it will have to have made a big enough splash beforehand to catch their attention and create a team to identify and block keys making certain kinds of requests.

I'd hope OpenAI, Anthropic, etc have a team and process in place already to identify suspicious keys, eg, those used from a huge variety of IPs, but I wouldn't be surprised if this were low on their list of priorities (until something like this hits).

agentifysh 1 month ago

What an amazing progress in just short time. The future is bright! Happy New Year y'all!

aussieguy1234 1 month ago

> The year of YOLO and the Normalization of Deviance #

On this including AI agents deleting home folders, I was able to run agents in Firejail by isolating vscode (Most of my agents are vscode based ones, like Kilo Code).

I wrote a little guide on how I did it https://softwareengineeringstandard.com/2025/12/15/ai-agents...

Took a bit of tweaking, vscode crashing a bunch of times with not being able to read its config files, but I got there in the end. Now it can only write to my projects folder. All of my projects are backed up in git.

NitpickLawyer 1 month ago

I have a bunch of tabs opened on this exact topic, so thank you for sharing. So far I've been using devcontainers w/ vscode, and mostly having a blast with it. It is a bit awkward since some extensions need to be installed in the remote env, but they seem to play nicely after you have it setup, and the keys and stuff get populated so things like kilocode, cline, roo work fine.

mark_l_watson 1 month ago

Thanks Simon, great writeup.

It has been an amazing year, especially around tooling (search, code analysis, etc.) and surprisingly capable smaller models.

lopatin 1 month ago

The "pelicans on a bike" challenge is pretty wide spread now. Are we sure it's still not being trained on?

simonw 1 month ago
See https://simonwillison.net/2025/nov/13/training-for-pelicans-... (also in the pelicans section of the post).
- lopatin 1 month ago
  
  > All I’ve ever wanted from life is a genuinely great SVG vector illustration of a pelican riding a bicycle.
  :)

lukaslalinsky 1 month ago

Speaking of asynchronous agents, what do people use? Claude Code for web is extremely limited, because you have no custom tools. Claude Code in GitHub Actions is vastly more useful, due to the custom environment, but ackward to use interactively. Are there any good alternatives?

simonw 1 month ago

I use Claude Code for web with an environment allowing full internet access, which means it can install extra tools as and when it needs them. I don't run into limits with it very often.
fullstackchris 1 month ago

I just use a couple of custom MCP tools with the standard claude desktop app:
https://chrisfrew.in/blog/two-of-my-favorite-mcp-tools-i-use...
IMO this is the best balance of getting agentic work done while having immediate access to anything else you may need with your development process.
ehsanu1 1 month ago
What exactly do you mean by custom tools here? Just cli tools accessible to the agent?
- lukaslalinsky 1 month ago
  
  Development environment needed to build and test the project.
jes5199 1 month ago
I'm running Claude Code in a tmux on a VPS, and I'm working on setting up a meta-agent who can talk to me over text messages
- absoluteunit1 1 month ago
  
  Hey - this sounds like really interesting set-up!
  Would you be open to providing more details. Would love to hear more, your workflows, etc.
jimmySixDOF 1 month ago

Pretty sure next year's wrapup will have "Year of the sub-agent"

npalli 1 month ago

Great summary of the year in LLMs. Is there a predictions (for 2026) blogpost as well?

simonw 1 month ago
Given how badly my 2025 predictions aged I'm probably going to sit that one out! https://simonwillison.net/2025/Jan/10/ai-predictions/
- zahlman 1 month ago
  
  Making predictions is useful even when they turn out very wrong. Consider also giving confidence levels, so that you can calibrate going forward.
  
  1 reply →
- DANmode 1 month ago
  
  Don’t be a bad sport, now!!

websiteapi 1 month ago

I'm curious how all of the progress will be seen if it does indeed result in mass unemployment (but not eradication) of professional software engineers.

ori_b 1 month ago
My prediction: If we can successfully get rid of most software engineers, we can get rid of most knowledge work. Given the state of robotics, manual labor is likely to outlive intellectual labor.
- BobbyJo 1 month ago
  
  I would have agreed with this a few months ago, but something Ive learned is that the ability to verify an LLMs output is paramount to its value. In software, you can review its output, add tests, on top of other adversarial techniques to verify the output immediately after generation.
  With most other knowledge work, I don't think that is the case. Maybe actuarial or accounting work, but most knowledge work exists at a cross section of function and taste, and the latter isn't an automatically verifiable output.
  
  4 replies →
- beardedwizard 1 month ago
  
  "Given the state of robotics" reminds me a lot of what was said about llms and image/video models over the past 3 years. Considering how much llms improved, how long can robotics be in this state?
  I have to think 3 years from now we will be having the same conversation about robots doing real physical labor.
  "This is the worst they will ever be" feels more apt.
  
  8 replies →
- 9dev 1 month ago
  
  That’s the deep irony of technology IMHO, that innovation follows Conway's law on a meta layer: White collar workers inevitably shaped high technology after themselves, and instead of finally ridding humanity of hard physical labour—as was the promise of the Industrial Revolution—we imitate artists, scientists, and knowledge workers.
  We can now use natural language to instruct computers generate stock photos and illustrations that would take a professional artist a few years ago, discover new molecule shapes, beat the best Go players, build the code for entire applications, or write documents of various shapes and lengths—but painting a wall? An unsurmountable task that requires a human to execute reliably, not even talking about economics.
- JumpCrisscross 1 month ago
  
  > If we can successfully get rid of most software engineers, we can get rid of most knowledge work
  Software, by its nature, is practically comprehensively digitized, both in its code history as well as requirements.
simonw 1 month ago

I nearly added a section about that. I wanted to contrast the thing where many companies are reducing junior engineering hires with the thing where Cloudflare and Shopify are hiring 1,000+ interns. I ran out of time and hadn't figured out a good way to frame it though so I dropped it.
legulere 1 month ago
Even if it will make software engineering drastically more productive, it’s questionable that this will lead to unemployment. Efficiency gains translate to lower prices. Sometimes this leads to very few additional demand, as can be seen with masses of typesetters that lost their jobs. Sometimes this leads to a dramatically higher demand like you can see in the classic Jevons paradox examples of coal and light bulbs. I highly suspect software falls in the latter category
- kingstnap 1 month ago
  
  Software demand is philosophically limited by the question of "What can your computer do for you?"
  You can describe that somewhat formally as:
  {What your computer can do} intersect {What you want done (consciously or otherwise)}
  Well a computer can technically calculate any computuable task that fits in bounded memory, that is an enormous set so its real limitations are its interfaces. In which case it can send packets, make noises, and display images.
  How many human desires are things that can be solved with making noises, displaying images, and sending packets? Turns out quite a few but its not everything.
  Basically I'm saying we should hope more sorts of physical interfaces come around (like VR and Robotics) so we cover more human desires. Robotics is a really general physical interface (like how ip packets are an extremely general interface) so its pretty promising if it pans out.
  Personally, I find it very hard to even articulate what desires I have. I have this feeling that I might be substantially happier if I was just sitting around a campfire eating food and chatting with people instead of enjoying whatever infinite stuff a super intelligent computer and robots could do for me. At least some of the time.
Madmallard 1 month ago

Why would it?
The ability to accurately describe what you want with all constraints managed and with proactive design is the actual skill. Not programming. The day PMs can do that and have LLMs that can code to that, is the day software engineers en masse will disappear. But that day is likely never.
The non-technical people I've ever worked for were hopelessly terrible at attention to detail. They're hiring me primarily for that anyway.
fullstackchris 1 month ago

This overly discussed thesis is already laughable - decent LLMs have been out for 3 years now and unemployment (using US as example) is up around 1% over the same time frame - and even attributing that small percentage change completely to AI is also laughable

vanderZwan 1 month ago

Speaking of new year and AI: my phone just suggested "Happy Birthday!" as the quick-reply to any "Happy New Year!" notification I got in the last hours.

I'm not too worried about my job just yet.

gverrilla 1 month ago

This year I had a spotify and a youtube thing to "recall my year", and it was abolute garbage (30% truth, to be exact). I think they're doing it more like an exercise to build up systems, infra, processes, people, etc - it's already clear they don't actually care about users.
pants2 1 month ago
It won't help to point out the worst examples. You're not competing with an outdated Apple LLM running on a phone. You're competing with Anthropic frontier models running on a multimillion dollar rack of servers.
- vanderZwan 1 month ago
  
  Sounds like I'm much more affordable with better ROI

sanreau 1 month ago

> Vendor-independent options include GitHub Copilot CLI, Amp, OpenHands CLI, and Pi

...and the best of them all, OpenCode[1] :)

[1]: https://opencode.ai

simonw 1 month ago
Good call, I'll add that. I think I mentally scrambled it with OpenHands.
- the_mitsuhiko 1 month ago
  
  Thanks for adding pi to it though :)
d4rkp4ttern 1 month ago
Can OpenCode be used with the Claude Max or ChatGPT Pro subscriptions, i.e., without per-token API charges?
- simonw 1 month ago
  
  Apparently it does work with Claude Max: https://opencode.ai/docs/providers/#anthropic
  I don't see a similar option for ChatGPT Pro. Here's a closed issue: https://github.com/sst/opencode/issues/704
  
  1 reply →
- ewoodrich 1 month ago
  
  Yes, I use it with a regular Claude Pro subscription. It also supports using GitHub Copilot subscriptions as a backend.
logicprog 1 month ago

I don't know why you're downloaded, OpenCode is by far the best.
nineteen999 1 month ago

How did I miss this until now! Thank you for sharing.

icapybara 1 month ago

It was the year of Claude Code

Gud 1 month ago

What about self hosting?

simonw 1 month ago

I talked about that in this section https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-... - and touched on it a bit in the section about Chinese AI labs: https://simonwillison.net/2025/Dec/31/the-year-in-llms/#the-...

_pdp_ 1 month ago

With everything that we have done so far (our company) I believe by end of 2026 our software will be self improving all the time.

And no it is not AI slop and we don't vibe code. There are a lot of practical aspects of running software and maintaining / improving code that can be done well with AI if you have the right setup. It is hard to formulate what "right" looks like at this stage as we are still iterating on this as well.

However, in our own experiments we can clearly see dramatic increases in automation. I mean we have agents working overnight as we sleep and this is not even pushing the limits. We are now wrapping major changes that will allows us to run AI agents all the time as long as we can afford them.

I can even see most of these materialising in Q1 2026.

Fun times.

papacj657 1 month ago
What exactly are your agents doing overnight? I often hear folks talk about their agents running for long periods of time but rarely talk about the outcomes they're driving from those agents.
- _pdp_ 1 month ago
  
  We have a lot of grunt work scheduled overnight like finding bugs, creating tests where we don’t have good coverage or where we can improve, integrations, documentation work, etc.
  Not everything gets accepted. There is a lot of work that is discarded and much more pending verification and acceptance.
  Frankly, and I hope I don’t come as alarmist (judge for yourself from my previous comments on Hn and Reddit) we cannot keep up with the output! And a lot of it is actually good and we should incorporate it even partially.
  At the moment we are figuring out how to make things more autonomous while we have the safety and guardrails in place.
  The biggest issue I see at this stage is how to make sense of it all as I do not believe we have the understanding of what is happening - just the general notion of it.
  I truly believe that we will reach the point where ideas matter more than execution, which what I would expect to be the case with more advanced and better applied AI.

Razengan 1 month ago

My experience with AI so far: It's still far from "butler" level assistance for anything beyond simple tasks.

I posted about my failures to try to get them to review my bank statements [0] and generally got gaslit about how I was doing it wrong, that I if trust them to give them full access to my disk and terminal, they could do it better.

But I mean, at that point, it's still more "manual intelligence" than just telling someone what I want. A human could easily understand it, but AI still takes a lot of wrangling and you still need to think from the "AI's PoV" to get the good results.

[0] https://news.ycombinator.com/item?id=46374935

----

But enough whining. I want AI to get better so I can be lazier. After trying them for a while, one feature that I think all natural-language As need to have, would be the ability to mark certain sentences as "Do what I say" (aka Monkey's Paw) and "Do what I mean", like how you wrap phrases in quotes on Google etc to indicate a verbatim search.

So for example I could say "[[I was in Japan from the 5th to 10th]], identify foreign currency transactions on my statement with "POS" etc in the description" then the part in the [[]] (or whatever other marker) would be literal, exactly as written, but the rest of the text would be up to the AI's interpretation/inference so it would also search for ATM withdrawals etc.

Ideally, eventually we should be able to have multiple different AI "personas" akin to different members of household staff: your "chef" would know about your dietary preferences, your "maid" would operate your Roomba, take care of your laundry, your "accountant" would do accounty stuff.. and each of them would only learn about that specific domain of your life: the chef would pick up the times when you get hungry, but it won't know about your finances, and so on. The current "Projects" paradigm is not quite that yet.

ck2 1 month ago

as I was clicking "gee I hope there's the year of pelicans riding bicycles"

left satisfied, lol

rldjbpin 1 month ago

while most of the discourse is around text and (multimodal) LLMs, the past year has been quite interesting in other media as well. i suppose the "slop" section did hint on it briefly.

while LLM-generated text was already a thing of the past couple years, this year images and videos had the "AI or not" moment. it appears to have a bigger impact than our myopic world of software. another trend towards the end of the year was around "vibe training" of new (albeit much smaller) AI models.

personally, getting up and running with a project has been easier than ever, but unlike OP, i don't share the same excitement to make anymore. perhaps vibe coding with a phone will get more streamlined with a killer app in 2026.

politelemon 1 month ago

> The problem is that the big cloud models got better too—including those open weight models that, while freely available, were far too large (100B+) to run on my laptop.

The actual, notable progress will be models that can run reasonably well on commodity, everyday hardware that the average user has. From more accessibility will come greater usefulness. Right now the way I see it, having to upgrade specs on a machine to run local models keeps it in a niche hobbyist bubble.

andrewinardeer 1 month ago

Thank you. Enjoyed this read.

AI slop videos will no doubt get longer and "more realistic" in 2026.

I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline. That would be the responsible thing to do. But I can't see Alphabet doing that on YT, xAI doing that on X or Meta doing that on FB/Insta as they all have skin in the video gen game.

compass_copium 1 month ago

>I really hope social media companies plaster a prominent banner over them which screams, "Likely/Made by AI" and give us the option to automatically mute these videos from our timeline.
They should just be deleted. They will not be, because they clearly generate ad revenue.
sexy_seedbox 1 month ago
For image generation, it's already too realistic with Z-Image + Custom LoRas + SeedVR2 upscaling.
- hooverd 1 month ago
  
  I do think for the solution of say non-consensual pornography the only solution is incredible violence against people making it.
cube00 1 month ago

> social media companies plaster a prominent banner over them
Not going to happen as the social media companies realise they can sell you the AI tools used to post slop back onto the platform.

huqedato 1 month ago

I completely disagree with the idea that 2025 "The (only?) year of MCP." In fact, I believe every year in the foreseeable future will belong to MCP. It is here to stay. MCP was the best (rational, scalable, predictable) thing since LLM madness broke loose.

ashishgupta2209 1 month ago

2026: The Year of Robots, note it for next year

nativeit 1 month ago

Between the people with invested and/conflicting interests, and the hordes of dogmatic zealots, I find discussions about AI to be the least productive or reliably informed on HN.

simonw 1 month ago

Honestly this thread was pretty disappointing. Many of the comments here could have been attached to any post about LLMs in the past year or so.

blutoot 1 month ago

I hope 2026 will be the year when software engineers and recruiters will stop the obsession with leetcode and all other forms of competitive programming bullshit

jennyholzer3 1 month ago

Thanks to Klaude Kode it'll be at least another 20 years of this.
If you don't make software developers prove their literacy you will get burned.

sho_hn 1 month ago

Not in this review: Also the record year in intelligent systems aiding in and prompting human users into fatal self-harm.

Will 2026 fare better?

simonw 1 month ago
I really hope so.
The big labs are (mostly) investing a lot of resources into reducing the chance their models will trigger self-harm and AI psychosis and suchlike. See the GPT-4o retirement (and resulting backlash) for an example of that.
But the number of users is exploding too. If they make things 5x less likely to happen but sign up 10x more people it won't be good on that front.
- Nuzzerino 1 month ago
  
  How does a model “trigger” self-harm? Surely it doesn’t catalyze the dissatisfaction with the human condition, leading to it. There’s no reliable data that can drive meaningful improvement there, and so it is merely an appeasement op.
  Same thing with “psychosis”, which is a manufactured moral panic crisis.
  If the AI companies really wanted to reduce actual self harm and psychosis, maybe they’d stop prioritizing features that lead to mass unemployment for certain professions. One of the guys in the NYT article for AI psychosis had a successful career before the economy went to shit. The LLM didn’t create those conditions, bad policies did.
  It’s time to stop parroting slurs like that.
  
  1 reply →
measurablefunc 1 month ago

The people working on this stuff have convinced themselves they're on a religious quest so it's not going to get better: https://x.com/RobertFreundLaw/status/2006111090539687956
andai 1 month ago
Also essential self-fulfilment.
But that one doesn't make headlines ;)
- sho_hn 1 month ago
  
  Sure -- but that's fair game in engineering. I work on cars. If we kill people with safety faults I expect it to make more headlines than all the fun roadtrips.
  What I find interesting with chat bots is that they're "web apps" so to speak, but with safety engineering aspects that type of developer is typically not exposed to or familiar with.
  
  2 replies →
inquirerGeneral 1 month ago

[dead]

smileson2 1 month ago

forgot to mention the first murder-suicide instigated by chatgpt

DANmode 1 month ago
These are his highlights as a killer blogger,
not AI’s highlights.
Easy with the hot take.
jennyholzer3 1 month ago

correction: the first murder-suicide instigated by chatgpt on record

DrewADesign 1 month ago

You’re absolutely right! You astutely observed that 2025 was a year with many LLMs and this was a selection of waypoints, summarized in a helpful timeline.

That’s what most non-tech-person’s year in LLMs looked like.

Hopefully 2026 will be the year where companies realize that implementing intrusive chatbots can’t make better ::waving hands:: ya know… UX or whatever.

For some reason, they think its helpful to distractingly pop up chat windows on their site because their customers need textual kindergarten handholding to … I don’t know… find the ideal pocket comb for their unique pocket/hair situation, or had an unlikely question about that aerosol pan release spray that a chatbot could actually answer. Well, my dog also thinks she’s helping me by attacking the vacuum when I’m trying to clean. Both ideas are equally valid.

And spending a bazillion dollars implementing it doesn’t mean your customers won’t hate it. And forcing your customers into pathways they hate because of your sunk costs mindset means it will never stop costing you more money than it makes.

I just hope companies start being honest with themselves about whether or not these things are good, bad, or absolutely abysmal for the customer experience and cut their losses when it makes sense.

Night_Thastus 1 month ago

They need to be intrusive and shoved in your face. This way, they can say they have a lot of people using them, which is a good and useful metric.
fantasizr 1 month ago
I took the good with the bad: the ai assisted coding tools are a multiplier, google ai overviews in search results are half baked (at best) and often just factually wrong. AI was put in the instagram search bar for no practical purpose etc.
- DrewADesign 1 month ago
  
  Yeah totally. The point I’m trying to make, however, is that most people don’t code, so they didn’t get the multiplier, and only got the mediocre-to-bad, with a handful of them doing things like generating dumb images for a boost. I think that’s why a lot of people in the software business are utterly bewildered when customers aren’t jumping for joy when they release a new AI “feature.” I think a lot of what gets classified as cynical ceo enshittification is really people ignoring basic good design practices, like making sure you’re effectively helping customers solve an actual problem in a context and with methods they, at least, don’t hate. Especially on the smaller scale, like indie app developers who probably get more out of AI than most, they really think people are going to like new AI features simply because they’re new AI features. They’re very wrong.
zahlman 1 month ago
As much as I side with you on this one, I really don't think this submission is the right place to rant about it.
- jennyholzer3 1 month ago
  
  this thread is for pro-LLM propaganda only.
  do not acknowledge that everyone in the world thinks this shit is a complete and total garbage fire
ronsor 1 month ago
> For some reason, they think its helpful to distractingly pop up chat windows on their site...
Companies have been doing this "live support" nonsense far longer than LLMs have been popular.
- DrewADesign 1 month ago
  
  There was also source point pollution before the Industrial Revolution. Useless, forced, irritating chat was ‘nowhere close’ to as aggressive or pervasive as it is now. It used to be a niche feature of some CRMs and now it’s everywhere.
  I’m on LinkedIn Learning digging into something really technical and practical and it’s constantly pushing the chat fly out with useless pre-populated prompts like “what are the main takeaways from this video.” And they moved their main page search to a little icon on the title bar and sneakily now what used to be the obvious, primary central search field for years sends a prompt to their fucking chatbot.

syndacks 1 month ago

[dupe]

ishashankmi 1 month ago

[dead]

ishashankmi 1 month ago

[dead]

nicos29 1 month ago

[dead]

compass_copium 1 month ago

>I’m still holding hope that slop won’t end up as bad a problem as many people fear.

That's the pure, uncut copium. Meanwhile, in the real world, search on major platforms is so slanted towards slop that people need to specify that they want actual human music:

https://old.reddit.com/r/MusicRecommendations/comments/1pq4f...

hindustanuday 1 month ago

[dead]

skydhash 1 month ago

[flagged]

dang 1 month ago
Could you please stop posting dismissive, curmudgeonly comments? It's not what this site is for, and destroys what it is for.
We want curious conversation here.
https://news.ycombinator.com/newsguidelines.html
- Madmallard 1 month ago
  
  His comment is far better than the rampant astroturfing from stakeholders going on everywhere on this website that is being mitigated not at all whatsoever. There is a wealth of information present suggesting these things are so bad for everyone in so many ways.
  
  4 replies →
- jennyholzer3 1 month ago
  
  Who is we?
  I want LLM astroturfers to have their reputations destroyed for pushing this idiocy on us
- nasnsjdkd 1 month ago
  
  [flagged]
n2d4 1 month ago
This is extremely dismissive. Claude Code helps me make a majority of changes to our codebase now, particularly small ones, and is an insane efficiency boost. You may not have the same experience for one reason or another, but plenty of devs do, so "nothing happened" is absolutely wrong.
2024 was a lot of talk, a lot of "AI could hypothetically do this and that". 2025 was the year where it genuinely started to enter people's workflows. Not everything we've been told would happen has happened (I still make my own presentations and write my own emails) but coding agents certainly have!
- bandrami 1 month ago
  
  Did you ship more in 2025 than in 2024?
  
  3 replies →
- skydhash 1 month ago
  
  And this is one of the vague "AI helped me do more".
  This is me touting for Emacs
  Emacs was a great plus for me over the last year. The integration with various tooling with comint (REPL integration), compile (build or report tools), TUI (through eat or ansi-term), gave me a unified experience through the buffer paradigm of emacs. Using the same set of commands boosted my editing process and the easy addition of new commands make it easy to fit my development workflow to the editor.
  This is how easy it is to write a non-vague "tool X helped me" and I'm not even an English native speaker.
  
  2 replies →
- jennyholzer3 1 month ago
  
  LLMs must be dismissed.
  The dismissive tone is warranted.
jennyholzer3 1 month ago
They're mad about this one.
That's how you know you're on the right track
These fuckers have their pants down, don't let them trick you out of leaving your mark.
- cindyllm 1 month ago
  
  [dead]
senordevnyc 1 month ago
This comment is legitimately hilarious to me. I thought it was satire at first. The list of what has happened in this field in the last twelve months is staggering to me, while you write it off as essentially nothing.
Different strokes, but I’m getting so much more done and mostly enjoying it. Can’t wait to see what 2026 holds!
- ronsor 1 month ago
  
  People who dislike LLMs are generally insistent that they're useless for everything and have infinitely negative value, regardless of facts they're presented with.
  Anyone that believes that they are completely useless is just as deluded as anyone that believes they're going to bring an AGI utopia next week.
MattRix 1 month ago
[flagged]
- dang 1 month ago
  
  Please don't respond to a bad comment by breaking the site guidelines yourself. That only makes things worse.
  https://news.ycombinator.com/newsguidelines.html
- skydhash 1 month ago
  
  Why do people assume negative critique is ignorance?
  
  27 replies →

justatdotin 1 month ago

[flagged]

simonw 1 month ago
Got a good news story about that one? I'm always interested in learning more about this issue, especially if it credibly counters the narrative that the issue is overblown.
- justatdotin 1 month ago
  
  [flagged]
  
  4 replies →

nasnsjdkd 1 month ago

[flagged]

jennyholzer3 1 month ago

[flagged]

anonnon 1 month ago

[flagged]

dang 1 month ago
He's one of the most valuable writers on LLMs, which are one of the major topics at present. That's not spam.
- anonnon 1 month ago
  
  > He's one of the most valuable writers on LLMs
  Is he, really? Most of his blog posts are little more than opportunistic, buttressing commentary on someone else's blog post or article, often with a bit of AI apologia sprinkled in (for example, marginalizing people as paranoid for not taking AI companies at their word that they aren't aggressively scraping websites in violation of robots.txt, or exfiltrating user data in AI-enbaled apps).
  EDIT: and why must he link to his blog so often in his comments? How is that not SEO/engagement farming? BTW dang, I wasn't insinuating the mods were in league with him or anything, just that, IMO, he's long past the point at which good faith should no longer be assumed.
  
  11 replies →
- rvz 1 month ago
  
  It is promotional spam.
  But given the volume of LLM slop, it was kind of obvious and known that even the moderators now have "favourites" over guidelines.
  > Please don't use HN primarily for promotion. It's ok to post your own stuff part of the time, but the primary use of the site should be for curiosity. [0]
  The blog itself is clearly used as promotion all the time when the original source(s) are buried deep in the post and almost all of the links link back to his own posts.
  This is now a first on HN and a new low for moderators and as admitted have regular promotional favourites on the top of HN.
  [0] https://news.ycombinator.com/newsguidelines.html
  
  7 replies →
simonw 1 month ago

Probably because my content gets a lot more upvotes than it does flags.
If this post was by anyone other than me would you have any problems with its quality?
firexcy 1 month ago

I appreciate his work for being more informative and organized than average AI-related content. Without his blogging, it would be a struggle to navigate the bombastic and narcissistic Twitter/Reddit posts for AI updates. The barrier to entry for AI reporting is so low that you just need to give a bit more care to be distinguished, and he is getting the deserved attention for doing exactly that in a systematical and disciplined manner. (I do believe many on HN are more than capable but not interested in doing the same.) Personally, I sometimes find his posts more congratulatory or trivial than I like, but I have learned to take what I want and ignore what I don’t.

castwide 1 month ago

[flagged]

techpression 1 month ago

Nothing about the severe impact on the environment, and the hand waviness about water usage hurt to read. The referenced post was missing every single point about the issue by making it global instead of local. And as if data center buildouts are properly planned and dimensioned for existing infrastructure…

Add to this that all the hardware is already old and the amount of waste we’re producing right now is mind boggling, and for what, fun tools for the use of one?

I don’t live in the US, but the amount of tax money being siphoned to a few tech bros should have heads rolling and I really don’t want to see it happening in Europe.

But I guess we got a new version number on a few models and some blown up benchmarks so that’s good, oh and of course the svg images we will never use for anything.

simonw 1 month ago
"Nothing about the severe impact on the environment"
I literally said:
"AI data centers continue to burn vast amounts of energy and the arms race to build them continues to accelerate in a way that feels unsustainable."
AND I linked to my coverage from last year, which is still true today (hence why I felt no need to update it): https://simonwillison.net/2024/Dec/31/llms-in-2024/#the-envi...
- jennyholzer3 1 month ago
  
  Do you think anything should be done about this environmental impact?
  Or should we just keep chugging along as though there is no problem at all?
  
  1 reply →

asgR1t 1 month ago

Most LLMs got worse in 2025. Only addicts and the type of computer gamer that feels drawn to complex setups, gamification and does not care about the end result will feel positive about the grift.

2025: The Year in Open Source? Nothing, all resources were tied up to debunk a couple of Python web developers who pose as the ultimate experts in LLMs.

simonw 1 month ago

In what way did they get worse?
I made you a dashboard of my 2025 writing about open-source that didn't include AI: https://simonwillison.net/dashboard/posts-with-tags-in-a-yea...

yupyupyups 1 month ago

Let's talk about the societal cost these models have had on us including their high energy cost and the proliferation of auto-generated slop media used to milk ad revenue, scam people, SEO farm, do propaganda or automate trolling. What about these big corporations collecting an astronomical amount of debt to hoard DRAM and NAND in a way that has crippled the PC market within weeks? And what are they going to do next, put a few dollars in Trump's pocket so that they can rob/loot the US population through bailouts? Who gets to keep all the hardware I wonder?

Nvidia, Samsung, SK Hynix and some other voltures I forgot to mention are making serious bank right now.

jennyholzer3 1 month ago

> Who gets to keep all the hardware I wonder?
Keep questions like this off of the propaganda thread.

jama211 1 month ago

The difference between the performance of models between 2024 and 2025 has been so stark, that graph really shows it. There are still many people on these forums who seem to think AI’s produce terrible code unless ultra supervised, and I can’t help but suspect some of them tried it a little while ago and just don’t understand how different it is now compared to even quite recently.

Madmallard 1 month ago
I used Gemini Pro, Claude Pro yesterday a couple of dozen times and basically have been daily.
I have a project to convert my multiplayer XNA game from C# to Javascript and to add networking to the game-play using LLMs.
They are far worse at it now than they were a year ago. They actually implemented the requirements (Though inaccurately) to the best of their ability a year ago. Especially Gemini.
Now they don't even come remotely close to implementing just the basic requirements.
The thing is, I'm giving them the entirety of the C# source code and spelling out what they should do.
- simonw 1 month ago
  
  Weird. I would expect Gemini 3 Pro and Claude Opus 4.5 to run rings around Gemini 1.5 Pro and Claude Sonnet 3.5.
  How are you running them - regular chat interface or do you have them setup with Claude Code or Gemini CLI?
  
  2 replies →
- jennyholzer3 1 month ago
  
  "They are far worse at it now than they were a year ago."
  This is the part they REALLY don't want you to say.
  They can no longer train these models effectively and their performance is slipping. Late 2023 was the golden age.