> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
However, in the end, execution is all that matters so if you and your cofounder are able to execute successfully with mountains of generated code then it doesn't matter what assets and liabilities you hold in the short term.
The long term is a lot harder to predict in any case.
> Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
Code that solves problems and makes you money is by definition an asset. Whether or not the code in question does those things remains to be seen, but code is not strictly a liability or else no one would write it.
Just this week, sun-tue. I added a fully functional subscription model to an existing platform, build out a bulk async elasticjs indexing for a huge database and migrated a very large Wordpress website to NextJS. 2.5 days, would have cost me at least a month 2 years ago.
All the productivity enhancement provided by LLMs for programming is caused by circumventing the copyright restrictions of the programs on which they have been trained.
You and anyone else could have avoided spending millions for programmer salaries, had you been allowed to reuse freely any of the many existing proprietary or open-source programs that solved the same or very similar problems.
I would have no problem with everyone being able to reuse any program, without restrictions, but with these AI programming tools the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
The copyright for programs has caused a huge multiplication of the programming effort for many decades, with everyone rewriting again and again similar programs, in order for their employing company to own the "IP". Now LLMs are exposing what would have happened in an alternative timeline.
The LLMs have the additional advantage of fast and easy searching through a huge database of programs, but this advantage would not have been enough for a significant productivity increase over a competent programmer that would have searched the same database by traditional means, to find reusable code.
Intellectual property law is a net loss to humanity, so by my reckoning, anything which lets us all work around that overhead gets some extra points on the credit side of the ledger.
> the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
Claude Code is $20 a month, and I get a lot of usage out of it. I don't see how cutting edge AI tools are only for the rich. The name OpenAI is often mocked, but they did succeed at bringing the cutting edge of AI to everyone, time and time again.
Then it really proves how much the economy would be booming if we abolished copyright, doesn't it? China ignores copyright too, and look at them surpassing us in all aspects of technology, while Western economies choose to sabotage themselves to keep money flowing upwards to old guys.
The codebase is old and really hard to work on. It’s a game that existed pre-iPhone and still has decent revenue but could use some updating. We intentionally shrank our company down to auto-pilot mode and frankly don’t even have a working development environment anymore.
It was basically cost prohibitive to change anything significant until Claude became able to do most of the work for us. My cofounder (also CTO of another startup in the interim) found himself with a lot of time on his hands unexpectedly and thought it would be a neat experiment and has been wowed by the results.
Much in the same way people on HN debate when we will have self driving cars while millions of people actually have their Teslas self-driving every day (it reminds me of when I got to bet that Joe Biden would win the election after he already did) those who think AI coding is years away are missing what’s happening now. It’s a powerful force magnifier in the hands of a skilled programmer and it’ll only get better.
It's not directly comparable. The first time writing the code is always the hardest because you might have to figure out the requirements along the way. When you have the initial system running for a while, doing a second one is easier because all the requirements kinks are figured out.
By the way, why does your co-founder have to do the rewrite at all?
You can compare it - just factor that in. And compare writing it with AI vs. writing it without AI.
We have no clue the scope of the rewrite but for anything non-trivial, 2 weeks just isn't going to be possible without AI. To the point of you probably not doing it at all.
I have no idea why they are rewriting the code. That's another matter.
I find the opposite to be true. Once you know the problem you’re trying to solve (which admittedly can be the biggest lift), writing the fist cut of the code is fun, and you can design the system and set precedent however you want. Once it’s in the wild, you have to work within the consequences of your initial decisions, including bad ones.
lol same. I just wrote a bunch of diagrams with mermaid that would legit take me a week, also did a mock of an UI for a frontend engineer that would take me another week to do .. or some designers. All of that in between meetings...
Waiting for it to actually go well to see what else I can do !
I have been able to prototype way faster. I can explain how I want a prototype reworked and it's often successful. Doesn't always work, but super useful more often than not.
In this thread: people throwing shade on tech that works, comparing it to a perfect world and making weird assumptions like no tests, no E2E or manual testing just to make a case. Hot take: most SWEs produce shit code, be it by constraints of any kind or their own abilities. LLMs do the same but cost less and can move faster. If you know how to use it, code will be fine. Code is a commodity and a lot of people will be blindsided by that in the future. If your value proposition is translating requirements into code, I feel sorry for you. The output quality of the LLM depends on the abilities of the operator. And most SWEs lack the system thinking to be good here, in my experience.
As a fractional CTO and in my decade of being co-founder/CTO I saw a lot of people and codebases and most of it is just bad. You need to compare real life codebases and outputs of developers, not what people wished it would be like. And the reality is that most of it sucks and most SWEs are bad at their jobs.
When I read the blog post, the impression I get is that the author is referring to the proposed "business" of licensing or selling "generative AI" (i.e., making money for the licensor or seller), not whether generative AI is saving money for any particular user
The author's second reference, an article from The Atlantic, describing the copyright liability issues with "generative AI", has been submitted to HN four times in the last week
AI Memorization Research (theatlantic.com)
2 points by tagyro 5 hours ago | flag | past | discuss
AI's Memorization Crisis (theatlantic.com)
2 points by twalichiewicz 1 day ago | flag | past | 1 comment
AI's Memorization Crisis (theatlantic.com)
3 points by palad1n 4 days ago | flag | past | 1 comment
AI's Memorization Crisis (theatlantic.com)
4 points by casparvitch 4 days ago | flag | past | discuss
Key thing here. The code was already written, so rewriting it isn't exactly adding a lot of quantifiable value. If millions weren't spent in the first place, there would be no code to rewrite.
> I myself am saving a small fortune on design and photography and getting better results while doing it.
Tell me you have bland taste without telling me you have bland taste. But if your customers eat it up and your slop manages to stand out in sea of slop, who am I to dislike slop.
Good luck with fixing that future mess. This is such an incredibly short sighted approach to running a company and software dev that I think your cofounder is likely going to torpedo your company.
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
If the LLM generating the code introduced a bug, who will be fixing it? The founder that does not know how to code or the LLM that made the mistake first?
Doesn't this imply that you were not getting the level of efficiency out of your investment? It would be a little odd to say this publicly as this says more about you and your company. The question would be what your code does and if it is profitable.
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
This is one of those statements that would horrify any halfway competent engineer. A cowboy coder going in, seeing a bunch of code and going 'I should rewrite this' is one of the biggest liabilities to any stable system.
I suspect he means as a trillion dollar corporation led endeavor.
I trained a small neural net on pics of a cat I had in the 00s (RIP George, you were a good cat).
Mounted a webcam I had gotten for free from somewhere, above the cat door, in the exterior of the house.
If the neural net recognized my cat it switched off an electromagnetic holding the pet door locked. Worked perfectly until I moved out of the rental.
Neural nets are, end of the day, pretty cool. It's the data center business that's the problem. Just more landlords, wannabe oligarchs, claiming ownership over anything they can get the politicians to give them.
The problem is... you're going to deprive yourself of the talent chain in the long run, and so is everyone else who is switching over to AI, both generative like ChatGPT and transformative like the various translation, speech recognition/transcription or data wrangling models.
For now, it works out for companies - but forward to, say, ten years in the future. There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
But by the time that plus the demographic collapse shows its effects, the people who currently call the shots will be in pension, having long since made their money. And my generation will be left with collapse everywhere and find ways to somehow keep stuff running.
Hell, it's already bad to get qualified human support these days. Large corporations effectively rule with impunity, with the only recourse consumers have being to either shell out immense sums of money for lawyers and court fees or turning to consumer protection/regulatory authorities that are being gutted as we speak both in money and legal protections, or being swamped with AI slop like "legal assistance" AI hallucinating case law.
> There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
There are be plenty of self taught developers who didn't need any "traineeship". That proportion will increase even more with AI/LLMs and the fact that there are no more jobs for youngsters. And actually from looking at the purely toxic comments on this thread, I would say that's a good thing for youngsters to be not be exposed to such "seniors".
Credentialism is dead. "Either ship or shutup" should be the mantra of this age.
I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.
Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.
> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.
You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.
We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.
There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.
The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.
If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.
For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.
It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.
I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.
There’s a subtle point a moment when you HAVE to take the driver wheel from the AI.
All issues I see are from people insisting to use far beyond the point it stops being useful.
It is a helper, a partner, it is still not ready go the last mile
>Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding
I haven't heard that at all. I hear about models that come out and are a bit better. And other people saying they suck.
>Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits.
Is it bringing you any value? I find it speeds things up a LOT.
I have a hard time believing that this v0, from 2023, achieved comparable results to Gemini 3 in Web design.
Gemini now often produces output that looks significantly better than what I could produce manually, and I'm an expert for web, although my expertise is more in tooling and package management.
Frankly I think the 'latest' generation of models from a lot of providers, which switch between 'fast' and 'thinking' modes, are really just the 'latest' because they encourage users to use cheaper inference by default. In chatgpt I still trust o3 the most. It gives me fewer flat-out wrong or nonsensical responses.
I'm suspecting that once these models hit 'good enough' for ~90% of users and use cases, the providers started optimizing for cost instead of quality, but still benchmark and advertise for quality.
We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.
>Generative AI, as we know it, has only existed ~5-6 years
Probably less than that, practically speaking. ChatGPT's initial release date was November 2022. It's closer to 3 years, in terms of any significant amount of people using them.
I don't think LLMs are an abject failure, but I find it equally odd that so many people think that transformer-based LLMs can be incrementally improved to perfection. It seems pretty obvious to me now that we're not gonna RLHF our way out of hallucinations. We'll probably need a few more fundamental architecture breakthroughs to do that.
I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.
I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.
I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.
I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.
Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.
I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".
I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".
The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.
Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.
It’s different in that Bitcoin was never useful in any capacity when it was new. AI is at least useful right now and it’s improved considerably in the last few years.
A year ago I would have agreed wholeheartedly and I was a self confessed skeptic.
Then Gemini got good (around 2.5?), like I-turned-my-head good. I started to use it every week-ish, not to write code. But more like a tool (as you would a calculator).
More recently Opus 4.5 was released and now I'm using it every day to assist in code. It is regularly helping me take tasks that would have taken 6-12 hours down to 15-30 minutes with some minor prompting and hand holding.
I've not yet reached the point where I feel letting is loose and do the entire PR for me. But it's getting there.
I think that's the key. Healthy skepticism is always appropriate. It's the outright cynicism that gets me. "AI will never be able to [...]", when I've been sitting here at work doing 2/3rds of those supposedly impossible things. Flawlessly? No, of course not! But I don't do those things flawlessly on the first pass, either.
Skepticism is good. I have no time or patience for cynics who dismiss the whole technology as impossible.
I think the concern expressed as "impossible" is whether it can ever do those things "flawlessly" because that's what we actually need from its output. Otherwise a more experienced human is forced to do double work figuring out where it's wrong and then fixing it.
This is not a lofty goal. It's what we always expect from a competent human regardless of the number of passes it takes them. This is not what we get from LLMs in the same amount time it takes a human to do the work unassisted. If it's impossible then there is no amount of time that would ever get this result from this type of AI. This matters because it means the human is forced to still be in the loop, not saving time, and forced to work harder than just not using it.
I don't mean "flawless" in the sense that there cannot be improvements. I mean that the result should be what was expected for all possible inputs, and when inspected for bugs there are reasonable and subtle technical misunderstandings at the root of them (true bugs that are possibly undocumented or undefined behavior) and not a mess of additional linguistic ones or misuse. This is the stronger definition of what people mean by "hallucination", and it is absolutely not fixed and there has been no progress made on it either. No amount of prompting or prayer can work around it.
This game of AI whack-a-mole really is a waste of time in so many cases. I would not bet on statistical models being anything more than what they are.
I would strongly recommend this podcast episode with Andrej Karpathy. I will poorly summarize it by saying his main point is that AI will spread like any other technology. It’s not going to be a sudden flash and everything is done by AI. It will be a slow rollout where each year it automates more and more manual work, until one day we realize it’s everywhere and has become indispensable.
It sounds like what you are seeing lines up with his predictions. Each model generation is able to take on a little more of the responsibilities of a software engineer, but it’s not as if we suddenly don’t need the engineer anymore.
Though I think it's a very steep sigmoid that we're still far on the bottom half of.
For math it just did its first "almost independent" Erdos problem. In a couple months it'll probably do another, then maybe one each month for a while, then one morning we'll wake up and find whoom it solved 20 overnight and is spitting them out by the hour.
For software it's been "curiosity ... curiosity ... curiosity ... occasionally useful assistant ... slightly more capable assistant" up to now, and it'll probably continue like that for a while. The inflection point will be when OpenAI/Anthropic/Google releases an e2e platform meant to be driven primarily by the product team, with engineering just being co-drivers. It probably starts out buggy and needing a lot of hand-holding (and grumbling) from engineering, but slowly but surely becomes more independently capable. Then at some point, product will become more confident in that platform than their own engineering team, and begin pushing out features based on that alone. Once that process starts (probably first at OpenAI/Anthropic/Google themselves, but spreading like wildfire across the industry), then it's just a matter of time until leadership declares that all feature development goes through that platform, and retains only as many engineers as is required to support the platform itself.
I'm now putting more queries into LLMs than I am into Google Search.
I'm not sure how much of that is because Google Search has worsened versus LLMs having improved, but it's still a substantial shift in my day-to-day life.
Something like finding the most appropriate sensor ICs to use for a particular use case requires so much less effort than it used to. I might have spent an entire day digging through data sheets before, and now I'll find what I need in a few minutes. It feels at least as revolutionary as when search replaced manually paging through web directories.
I feel like I'm living in a totally different world or I'm being gaslit by LLMs when I read stuff like this and other similar comments in this thread. Do you mind mentioning _what_ language / tech stack you're in? At my current job, we have a large Ruby on Rails codebase and just this week Gemini 2.5 and 3 struggled to even identify what classes inherited from another class.
This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
I work commercializing AI in some very specific use cases where it extremely valuable. Where people are being lead astray is layering generalizations: general use cases (copilots) deployed across general populations and generally not doing very well. But that's PMF stuff, not a failure of the underlying tech.
I think both sides of this debate are conflating the tech and the market. First of all, there were forms of "AI" before modern Gen AI (machine learning, NLP, computer vision, predictive algorithms, etc) that were and are very valuable for specific use cases. Not much has changed there AFAICT, so it's fair that the broader conversation about Gen AI is focused on general use cases deployed across general populations. After all, Microsoft thinks it's a copilot company, so it's fair to talk about how copilots are doing.
On the pro-AI side, people are conflating technology success with product success. Look at crypto -- the technology supports decentralization, anonymity, and use as a currency; but in the marketplace it is centralized, subject to KYC, and used for speculation instead of transactions. The potential of the tech does not always align with the way the world decides to use it.
On the other side of the aisle, people are conflating the problematic socio-economics of AI with the state of the technology. I think you're correct to call it a failure of PMF, and that's a problem worth writing articles about. It just shouldn't be so hard to talk about the success of the technology and its failure in the marketplace in the same breath.
I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
Which ones are you claiming have already been achieved?
My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
Yes. I am a novelist and I noticed a step change in what was possible here around Claude Sonnet 3.7 in terms of being able to analyze my own unpublished work for theme, implicit motivations, subtext, etc -- without having any pre-digested analysis of the work in its training data.
No human reads a novel and evaluates it as a whole. It's a story and the readers perception changes over the course of reading the book. Current AI can certainly do that.
>Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo)
Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
>reliably answer questions about plot, character, conflicts, motivations
LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
Which ones of those have been achieved in your opinion?
I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
Then, probably reading a novel and answering questions is the next most successful.
Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.
I'm pretty sure it can do all of those except for the one which requires a physical body (in the kitchen) and the one that humans can't do reliably either (construct 10000 loc bug-free).
> Many of these have already been achieved, and it's only early 2026.
I'm quite sure people who made those (now laughable) predictions will tell you none of these has been achieved, because AI isn't doing this "reliably" or "bug-free."
Defending your predictions is like running an insurance company. You always win.
Besides being a cook which is more of a robotics problem all of the rest are accomplished to the point of being arguable about how reliably LLMs can perform these tasks, the arguments being between the enthusiast and naysayer camps.
The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
All but the robotics one are demonstrable in 2026 at least.
In my opinion, contrary to other comments here I think AI can do all of the above already except being a kitchen cook.
Just earlier today I asked it to give me a summary of a show I was watching until a particular episode in a particular season without spoiling the rest of it and it did a great job.
And why not? Is there any reason for this comment to not appear?
If Bill Gates made a predication about computing, no matter what the predication says, you can bet that 640K memory quote would be mentioned in the comment section (even he didn't actually say that).
I think it’s for good reason. I’m a bit at a loss as to why every time this guy rages into the ether of his blog it’s considered newsworthy. Celebrity driven tech news is just so tiresome. Marcus was surpassed by others in the field and now he’s basically a professional heckler on a university payroll. I wish people could just be happy for the success of others instead of fuming about how so and so is a billionaire and they are not.
I’ve been using Claude Code, Gemini 3 Pro, and Nano Banana Pro to plan, code, and create custom UI elements for dozens of time-saving applications. For years, I have been searching high and low for existing solutions, but all I found were either overpriced cloud offerings that were bloated with endless features I didn’t need and just complicated the UI, or abandoned GitHub repos consisting of an initial commit and a roadmap that has been waiting eight years for its first update and what code was present was half baked and out of date.
The reality is that my requirements are so specific to my workflow that until these latest models came along, building exactly what I needed in a matter of hours for a cost of $20 a month was inconceivable. Now I provide a description of what functionality I need, some sketches of the UI I made on my ipad with an apple pencil and after a bit of back and forth to get everything dialled in and I’ve created a bit of software that will save me dozens if not hundreds of hours of previously tedious manual work.
Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein yet, it's not going all that well"
The goalposts keep getting pushed further and further every month. How many math and coding Olympiads and other benchmarks will LLMs need to dominate before people will actually admit that in some domains it's really quite good.
Sure, if you're a Nobel prize winner or PhD then LLMs aren't as good as you yet, but for 99% of the people in the world, LLMs are better than you at Math, Science, Coding, and every language probably except your native language, and it's probably better at you at that too...
Ignoring the actual poor quality of this write-up, I think we don't know how well GenAI is going to be honest. I feel we've not been able to properly measure or assess it's actual impact yet.
Even as I use it, and I use it everyday, I can't really assess its true impact. Am I more productive or less overall? I'm not too sure. Do I do higher quality work or lower quality work overall? I'm not too sure.
All I know, it's pretty cool, and using it is super easy. I probably use it too much, in a way, that it actually slows things down sometimes, when I use it for trivial things for example.
At least when it comes to productivity/quality I feel we don't really know yet.
But there are definite cool use-cases for it, I mean, I can edit photos/videos in ways I simply could not before, or generate a logo for a birthday party, I couldn't do that before. I can make a tune that I like, even if it's not the best song in the world, but it can have the lyrics I want. I can have it extract whatever from a PDF. I can have it tell me what to watch out for in a gigantic lease agreement I would not have bothered reading otherwise.
I can have it fix my tests, or write my tests, not sure if it saves me time, but I hate doing that, so it definitely makes it more fun and I can kind of just watch videos at the same time, what I couldn't before. Coding quality of life improvements are there too, I want to generate a sample JSON out of a JSONSchema, and so on. If I want, I can write the a method using English prompts instead of the code itself, might not truly be faster or not, not sure, but sometimes it's less mentally taxing, depending on my mood, it can be more fun or less fun, etc.
All those are pretty awesome wins and a sign that for sure those things will remain and I will happily pay for them. So maybe it depends on what you expected.
The irony of a five sentence article making giant claims isn't lost on me. Don't get me wrong: I'm amenable to the idea; but, y'know, my kids wrote longer essays in 4th grade.
All I know is that I have built more in the past 10 months than I ever have. How do you quantify for the skeptics the mental shift that happens when you know you can just build stuff now?
COULD I do this stuff before? Sure. But I wouldn’t have. Life gets in the way. Now, the bar is low so why not build stuff? Some of it ships, some of it is just experimentation. It’s all building.
Trying to quantify that shift is impossible. It’s not a multiplier to productivity you measure by commits. It’s a builder mind shift.
"I have built more in the past 10 months than I ever have."
Correction. The genAI has built it.
I haven't got any skin on either side here, but doesn't the fact the genAI can build it imply that what you are doing is heavily trodden ground, that there will be less and less need for developers like you, and will gradually lead to many developers (like you) being cut out of the market entirely.
For personal stuff it's wonderful. For work, it seems like a double edged sword that will eventually cut the devs that use it (and those that don't). Even if the business owners aren't completely daft and keep a (vastly diminished) workforce of dev/AI consultants on board, that could easily exclude you or me.
It's going well if all the jobs it eradicates can be replaced with just as many jobs (they can't), or the powers that be catch on and realise there isn't that many jobs left for humans to do and institute some form of basic income system (they won't).
"The genAI has built it" -- this is the core point. If I did nothing except complain about AI for the past 10 months, would these projects exist? No they would not. So. I. Built. It.
If you actually use these tools, really use them. You realize that it's an augmentation not a replacement. Simply because the training data is what has already come before (for now!). The LLMs need help, direction, focus...and those are learned skills dependent on the tooling. Not to mention ideas.
And sure, I imagine the software development workforce will change quite a bit, probably this year, no doubt about that.
But the need for builders will not change. I imagine that the 'builder' role will change to be traditional software developers, designers, sales people, writers, c-suite...whatever.
So I think you are right. "That could easily exclude you or me". 100% correct. The required skill set to be a builder is changing on a weekly basis. The only way to keep up is to keep building with these tools. Trying things. Experimenting. Otherwise, yes, you will probably be replaced. By a builder.
Guessing this isn’t going to be popular here, but he’s right. AI has some use cases, but isn’t the world-changing paradigm shift it’s marketed as. It’s becoming clear the tech is ultimately just a tool, not a precursor to AGI.
How long do you think it will be until the “ai isn’t doing anything” people are going away
1 month, 6 months, I’d say 1 Year at the most, anyone who has used Claude code since Dec 1st knows this in their bones, so I’d just let these people shout from the top of the hill until they run out of steam…
Right around then, we can send a bunch of reconnaissance teams out to the abandoned Japanese islands to rescue them from the war that’s been over for 10 years - hopefully they can rejoin society, merge back with reality and get on with their lives
I think the "AI isn't doing anything" crowd have some kind of vocabulary/language barriers/deficiencies that prevent them from refining their prompting methods into something that works for them.
I find that the more precise I am in my prompts, the more precise the response. But that requires that I use vocabulary that I wouldn't use in a human conversation.
What a joke this guy is. I can sit down and crank out a real, complex feature in a couple hours that would have previously taken days and ship it to the users of our AI platform who can then respond to RFQs in minutes where they would have previously spent hours matching descriptions to part numbers manually.
...and yet we still see these articles claiming LLMs are dying/overhyped/major issues/whatever.
Cool man, I'll just be over here building my AI based business with AI and solving real problems in the very real manufacturing sector.
I keep reading comments that claim GenAI's positive traits, but this usually amounts to some toy PoC that very eerily mirrors work found in code bootcamps. You want an app that has logins and comments and upvotes? GenAI is going to look amazing setting up a non-relational db to your node backend.
Aye. If you've not turned a real profit with your thing, I will default to believing that you don't know what you're talking about and are probably building toys.
It's nothing to do with AI. I didn't believe "I rewrote my application in three weeks!" claims before AI, and I don't believe them now. Most people are not able to evaluate themselves, I don't see why that would have changed.
Meanwhile $employer is continuing to migrate individual tasks to in-house AI tooling, and has licensed an off-the-shelf coding agent for all of us developers to put in our IDEs.
Gary Marcus again. The chief doomer of AI where goal posts keep on moving.
Almost everyone around me, even the primary school kids use ChatGPT/Perplexity/Gemini/Claude in some form on almost a daily basis. The daily engagement is v strong.
The models keep improving every year. Nano banana gets text spot on, human anatomy of digits and toes is spot on. Deep Research mode is mind boggling. All the major vendors have some form of voice interaction, and it feels pretty good. I use perplexity talk feature while driving to learn deep about a topic of interest.
The trend is strong, betting against the trend isn't wise.
I can paste entire books and ask questions about certain pieces. The context windows nowadays are wild.
Price per token keeps on dropping, more capability keeps on coming online.
It's literally just four screenshots paired with this sentence.
> Trying to orient our economy and geopolitical policy around such shoddy technology — particularly on the unproven hopes that it will dramatically improve– is a mistake.
The screenshots are screenshots of real articles. The sentence is shorter than a typical prompt.
I’m starting to think this take is legitimately insane.
As said in the article, a conservative estimate is that Gen AI can currently do 2.5% of all jobs in the entire economy. A technology that is really only a couple of years old. This is supposed to be _disappointing_? That’s millions of jobs _today_, in a totally nascent form.
I mean I understand skepticism, I’m not exactly in love with AI myself, but the world has literally been transformed.
Download models you can find now and forever. The guardrails will only get worse, or models banned entirely. Whether it's because of "hurts people's health" or some other moral panic, it will kill this tech off.
gpt-oss isn't bad, but even models you cannot run are worth getting since you may be able to run them in the future.
I'm hedging against models being so nerfed they are useless. (This is unlikely, but drives are cheap and data is expensive.)
It's going well for coding. I just knocked out a mapping project that would have been a week+ of work (with docs and stackoverflow opened in the background) in a few hours.
And yes, I do understand the code and what is happening and did have to make a couple of adjustments manually.
I don't know that reducing coding work justifies the current valuations, but I wouldn't say it's "not going all that well".
He's such a joke that even LLMs make fun of him. The Gemini-generated Hacker News frontpage for December 9 2035 contains an article by Gary Marcus: "AI progress is stalling": https://dosaygo-studio.github.io/hn-front-page-2035/news
I've just started ignoring people like this. You think everything's going bad? Okay fine. You go ahead and keep believing that. Maybe you could get it printed on a sandwich board and walk up and down the street with it.
Seems like black and white thinking to me. I had it make suggestions for 10 triage issues for my team today and agreed with all of its routings. That’s certainly better than 6 months ago.
I just used ChatGPT to diagnose a very serious but ultimately not-dangerous health situation last week and it was perfect. It literally guided me perfectly without making me panic and helped me understand what was going on.
We use ChatGPT at work to do things that we have literally laid people off for, because we don't need them anymore. This included fixing bugs at a level that is at least E5/senior software engineer. Sometimes it does something really bad but it definitely saves times and helps avoid adding headcount.
Generative AI is years beyond what I would have expected even 1 year ago. This guy doesn't know what he's talking about, he's just picking and choosing one-off articles that make it seem like it's supporting his points.
All this AI discussion has done is reveal how naive some people are.
You're not losing your job unless you work on trivial codebases. There's a very clear pattern what those are: startups, greenfield, games, junk apps, mindless busywork that probably has an existing better tool on github, etc. Basically anything that doesn't have any concrete business requirements or legal liability.
This isn't to say those codebases will always be trivial, but good luck cleaning that up or facing the reality of having to rewrite it properly. At least you have AI to help with boilerplate. Maybe you'll learn to read docs along the way.
The people claiming to be significantly more productive are either novice programmers or optimistic for unexplained reasons they're still trying to figure out. When they want to let us know, most people still won't care because it's not even the good kind of unreasonable that brings innovation.
The only real value in modern LLMs is that natural language processing is a lot better than it used to be.
I wholeheartedly agree. Shitty companies steal art and then put out shitty products that shitty people use to spam us with slop.
The same goes for code as well.
I’ve explored Claude code/antigravity/etc, found them mostly useless, tried a more interactive approach with copilot/local models/ tried less interactive “agents”/etc. it’s largely all slop.
My coworkers who claim they’re shipping at warp speed using generative AI are almost categorically our worst developers by a mile.
Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
I myself am saving a small fortune on design and photography and getting better results while doing it.
If this is not all that well I can’t wait until we get to mediocre!
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
However, in the end, execution is all that matters so if you and your cofounder are able to execute successfully with mountains of generated code then it doesn't matter what assets and liabilities you hold in the short term.
The long term is a lot harder to predict in any case.
> Code is not an asset it's a liability, and code that no one has reviewed is even more of a liability.
Code that solves problems and makes you money is by definition an asset. Whether or not the code in question does those things remains to be seen, but code is not strictly a liability or else no one would write it.
15 replies →
Developers that can’t see the change are blind.
Just this week, sun-tue. I added a fully functional subscription model to an existing platform, build out a bulk async elasticjs indexing for a huge database and migrated a very large Wordpress website to NextJS. 2.5 days, would have cost me at least a month 2 years ago.
12 replies →
>Code is not an asset it's a liability
This would imply companies could delete all their code and do better, which doesn't seem true?
1 reply →
All the productivity enhancement provided by LLMs for programming is caused by circumventing the copyright restrictions of the programs on which they have been trained.
You and anyone else could have avoided spending millions for programmer salaries, had you been allowed to reuse freely any of the many existing proprietary or open-source programs that solved the same or very similar problems.
I would have no problem with everyone being able to reuse any program, without restrictions, but with these AI programming tools the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
The copyright for programs has caused a huge multiplication of the programming effort for many decades, with everyone rewriting again and again similar programs, in order for their employing company to own the "IP". Now LLMs are exposing what would have happened in an alternative timeline.
The LLMs have the additional advantage of fast and easy searching through a huge database of programs, but this advantage would not have been enough for a significant productivity increase over a competent programmer that would have searched the same database by traditional means, to find reusable code.
Intellectual property law is a net loss to humanity, so by my reckoning, anything which lets us all work around that overhead gets some extra points on the credit side of the ledger.
1 reply →
> the rich are now permitted to ignore copyrights, while the poor remain constrained by them, as before.
Claude Code is $20 a month, and I get a lot of usage out of it. I don't see how cutting edge AI tools are only for the rich. The name OpenAI is often mocked, but they did succeed at bringing the cutting edge of AI to everyone, time and time again.
2 replies →
Then it really proves how much the economy would be booming if we abolished copyright, doesn't it? China ignores copyright too, and look at them surpassing us in all aspects of technology, while Western economies choose to sabotage themselves to keep money flowing upwards to old guys.
6 replies →
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
Why?
Im not even casting shade - I think AI is quite amazing for coding and can increase productivity and quality a lot.
But I'm curious why he's doing this.
The codebase is old and really hard to work on. It’s a game that existed pre-iPhone and still has decent revenue but could use some updating. We intentionally shrank our company down to auto-pilot mode and frankly don’t even have a working development environment anymore.
It was basically cost prohibitive to change anything significant until Claude became able to do most of the work for us. My cofounder (also CTO of another startup in the interim) found himself with a lot of time on his hands unexpectedly and thought it would be a neat experiment and has been wowed by the results.
Much in the same way people on HN debate when we will have self driving cars while millions of people actually have their Teslas self-driving every day (it reminds me of when I got to bet that Joe Biden would win the election after he already did) those who think AI coding is years away are missing what’s happening now. It’s a powerful force magnifier in the hands of a skilled programmer and it’ll only get better.
9 replies →
It's not directly comparable. The first time writing the code is always the hardest because you might have to figure out the requirements along the way. When you have the initial system running for a while, doing a second one is easier because all the requirements kinks are figured out.
By the way, why does your co-founder have to do the rewrite at all?
You can compare it - just factor that in. And compare writing it with AI vs. writing it without AI.
We have no clue the scope of the rewrite but for anything non-trivial, 2 weeks just isn't going to be possible without AI. To the point of you probably not doing it at all.
I have no idea why they are rewriting the code. That's another matter.
I find the opposite to be true. Once you know the problem you’re trying to solve (which admittedly can be the biggest lift), writing the fist cut of the code is fun, and you can design the system and set precedent however you want. Once it’s in the wild, you have to work within the consequences of your initial decisions, including bad ones.
1 reply →
G’day Matt from myself another person with a cofounder both getting insane value out of AI and astounded at the attitudes around HN.
You sound like complete clones of us :-)
We’ve been at it since July and have built what used to take 3-5 people that long.
To the haters: I use TDD and review every line of code, I’m not an animal.
There’s just 2 of us but some days it feels like we command an army.
lol same. I just wrote a bunch of diagrams with mermaid that would legit take me a week, also did a mock of an UI for a frontend engineer that would take me another week to do .. or some designers. All of that in between meetings...
Waiting for it to actually go well to see what else I can do !
The more I have this experience and read people maligning AI for coding, the more I think the junior developers are actually not the ones in danger.
7 replies →
I have been able to prototype way faster. I can explain how I want a prototype reworked and it's often successful. Doesn't always work, but super useful more often than not.
That line on the chart labeled “profit” is really going to go up now!
Howcome you need to re-write millions of dollars in code?
In this thread: people throwing shade on tech that works, comparing it to a perfect world and making weird assumptions like no tests, no E2E or manual testing just to make a case. Hot take: most SWEs produce shit code, be it by constraints of any kind or their own abilities. LLMs do the same but cost less and can move faster. If you know how to use it, code will be fine. Code is a commodity and a lot of people will be blindsided by that in the future. If your value proposition is translating requirements into code, I feel sorry for you. The output quality of the LLM depends on the abilities of the operator. And most SWEs lack the system thinking to be good here, in my experience.
As a fractional CTO and in my decade of being co-founder/CTO I saw a lot of people and codebases and most of it is just bad. You need to compare real life codebases and outputs of developers, not what people wished it would be like. And the reality is that most of it sucks and most SWEs are bad at their jobs.
When I read the blog post, the impression I get is that the author is referring to the proposed "business" of licensing or selling "generative AI" (i.e., making money for the licensor or seller), not whether generative AI is saving money for any particular user
The author's second reference, an article from The Atlantic, describing the copyright liability issues with "generative AI", has been submitted to HN four times in the last week
AI Memorization Research (theatlantic.com)
2 points by tagyro 5 hours ago | flag | past | discuss
AI's Memorization Crisis (theatlantic.com)
2 points by twalichiewicz 1 day ago | flag | past | 1 comment
AI's Memorization Crisis (theatlantic.com)
3 points by palad1n 4 days ago | flag | past | 1 comment
AI's Memorization Crisis (theatlantic.com)
4 points by casparvitch 4 days ago | flag | past | discuss
Sounds like an argument for better hiring practices and planning.
Producing a lot of code isn’t proof of anything.
Yep. Let’s see the projects and more importantly the incremental returns…
Senior developer here, your co-founder is making a huge mistake. Their lack of knowledge about the codebase will be your undoing. PS. I work in GenAI.
Is the cofounder "rewriting" that code providing zero of the existing code as context? Doing it in a completely green field fashion?
Or is any of the existing platform is used as an input for the rewrite?
>rewriting code
Key thing here. The code was already written, so rewriting it isn't exactly adding a lot of quantifiable value. If millions weren't spent in the first place, there would be no code to rewrite.
no need to wait, by using AI you already are mediocre at best (because you forego skill and quality for speed)
Is this also true of carpenters who use circular saws and airguns instead of hand saws and hammers?
1 reply →
>I myself am saving a small fortune on design and photography and getting better results while doing it.
Is this because you are improving your already existing design and photography skills and business ?
Or have you bootstrapped from the scratch with AI ?
Do you mind sharing or giving a hint ?
Thanks!
Out of curiosity, what is your product?
I myself am saving a small fortune on design and photography and getting better results while doing it.
Yay! Let's put all the artists out of business and funnel all the money to the tech industry. That's how to build a vibrant society. Yay!
> I myself am saving a small fortune on design and photography and getting better results while doing it.
Tell me you have bland taste without telling me you have bland taste. But if your customers eat it up and your slop manages to stand out in sea of slop, who am I to dislike slop.
>my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
I was expecting a language reference (we all know which one), to get more speed, safety and dare I say it "web scale" (insert meme). :)
> and dare I say it "web scale"
Obligatory reference https://www.youtube.com/watch?v=b2F-DItXtZs
Good luck with fixing that future mess. This is such an incredibly short sighted approach to running a company and software dev that I think your cofounder is likely going to torpedo your company.
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
If the LLM generating the code introduced a bug, who will be fixing it? The founder that does not know how to code or the LLM that made the mistake first?
Doesn't this imply that you were not getting the level of efficiency out of your investment? It would be a little odd to say this publicly as this says more about you and your company. The question would be what your code does and if it is profitable.
> Meanwhile, my cofounder is rewriting code we spent millions of salary on in the past by himself in a few weeks.
This is one of those statements that would horrify any halfway competent engineer. A cowboy coder going in, seeing a bunch of code and going 'I should rewrite this' is one of the biggest liabilities to any stable system.
I assume this is because they're already insanely profitable after hitting PMF and are now trying to bring down infra costs?
Right? RIGHT?!
My cofounder is an all the way competent engineer. Making this many assumptions would horrify someone halfway competent with logic though.
2 replies →
Every professional SWE is going to stare off into the middle distance, as they flashback to some PM or VP deciding to show everyone they still got it.
The "how hard could it be" fallacy claims another!
5 replies →
I suspect he means as a trillion dollar corporation led endeavor.
I trained a small neural net on pics of a cat I had in the 00s (RIP George, you were a good cat).
Mounted a webcam I had gotten for free from somewhere, above the cat door, in the exterior of the house.
If the neural net recognized my cat it switched off an electromagnetic holding the pet door locked. Worked perfectly until I moved out of the rental.
Neural nets are, end of the day, pretty cool. It's the data center business that's the problem. Just more landlords, wannabe oligarchs, claiming ownership over anything they can get the politicians to give them.
On design and photography? So you’re filling your product with slop images and graphics? Users won’t like it
The problem is... you're going to deprive yourself of the talent chain in the long run, and so is everyone else who is switching over to AI, both generative like ChatGPT and transformative like the various translation, speech recognition/transcription or data wrangling models.
For now, it works out for companies - but forward to, say, ten years in the future. There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
But by the time that plus the demographic collapse shows its effects, the people who currently call the shots will be in pension, having long since made their money. And my generation will be left with collapse everywhere and find ways to somehow keep stuff running.
Hell, it's already bad to get qualified human support these days. Large corporations effectively rule with impunity, with the only recourse consumers have being to either shell out immense sums of money for lawyers and court fees or turning to consumer protection/regulatory authorities that are being gutted as we speak both in money and legal protections, or being swamped with AI slop like "legal assistance" AI hallucinating case law.
> There won't be new intermediates or seniors any more to replace the ones that age out or quit the industry entirely in frustration of them not being there for actual creativity but to clean up AI slop, simply because there won't have been a pipeline of trainees and juniors for a decade.
There are be plenty of self taught developers who didn't need any "traineeship". That proportion will increase even more with AI/LLMs and the fact that there are no more jobs for youngsters. And actually from looking at the purely toxic comments on this thread, I would say that's a good thing for youngsters to be not be exposed to such "seniors".
Credentialism is dead. "Either ship or shutup" should be the mantra of this age.
1 reply →
I find it a bit odd that people are acting like this stuff is an abject failure because it's not perfect yet.
Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Yes, people have probably been deploying it in spots where it's not quite ready but it's myopic to act like it's "not going all that well" when it's pretty clear that it actually is going pretty well, just that we need to work out the kinks. New technology is always buggy for awhile, and eventually it becomes boring.
> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding. Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits. We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
It can one shot a web frontend, just like v0 could in 2023. But that's still about all I've seen it work on.
You’re doing exactly the thing that the parent commenter pointed out: Complaining that they’re not perfect yet as if that’s damning evidence of failure.
We all know LLMs get stuck. We know they hallucinate. We know they get things wrong. We know they get stuck in loops.
There are two types of people: The first group learns to work within these limits and adapt to using them where they’re helpful while writing the code when they’re not.
The second group gets frustrated every time it doesn’t one-shot their prompt and declares it all a big farce. Meanwhile the rest of us are out here having fun with these tools, however limited they are.
6 replies →
Sure, but think about what it's replacing.
If you hired a human, it will cost you thousands a week. Humans will also fail at basic tasks, get stuck in useless loops, and you still have to pay them for all that time.
For that matter, even if I'm not hiring anyone, I will still get stuck on projects and burn through the finite number of hours I have on this planet trying to figure stuff out and being wrong for a lot of it.
It's not perfect yet, but these coding models, in my mind, have gotten pretty good if you're specific about the requirements, and even if it misfires fairly often, they can still be useful, even if they're not perfect.
I've made this analogy before, but to me they're like really eager-to-please interns; not necessarily perfect, and there's even a fairly high risk you'll have to redo a lot of their work, but they can still be useful.
10 replies →
There’s a subtle point a moment when you HAVE to take the driver wheel from the AI. All issues I see are from people insisting to use far beyond the point it stops being useful.
It is a helper, a partner, it is still not ready go the last mile
4 replies →
> We're talking "copy basic examples and don't hallucinate APIs" here, not deep complicated system design topics.
If your metric is an LLM that can copy/paste without alterations, and never hallucinate APIs, then yeah, you'll always be disappointed with them.
The rest of us learn how to be productive with them despite these problems.
14 replies →
>Every 2/3 months we're hearing there's a new model that just blows the last one out of the water for coding
I haven't heard that at all. I hear about models that come out and are a bit better. And other people saying they suck.
>Meanwhile, here I am with Opus and Sonnet for $20/mo and it's regularly failing at basic tasks, antigravity getting stuck in loops and burning credits.
Is it bringing you any value? I find it speeds things up a LOT.
I have a hard time believing that this v0, from 2023, achieved comparable results to Gemini 3 in Web design.
Gemini now often produces output that looks significantly better than what I could produce manually, and I'm an expert for web, although my expertise is more in tooling and package management.
Frankly I think the 'latest' generation of models from a lot of providers, which switch between 'fast' and 'thinking' modes, are really just the 'latest' because they encourage users to use cheaper inference by default. In chatgpt I still trust o3 the most. It gives me fewer flat-out wrong or nonsensical responses.
I'm suspecting that once these models hit 'good enough' for ~90% of users and use cases, the providers started optimizing for cost instead of quality, but still benchmark and advertise for quality.
We implement pretty cool workflows at work using "GenAI" and the users of our software are really appreciative. It's like saying a hammer sucks because it breaks most things you hit with it.
>Generative AI, as we know it, has only existed ~5-6 years
Probably less than that, practically speaking. ChatGPT's initial release date was November 2022. It's closer to 3 years, in terms of any significant amount of people using them.
I don't think LLMs are an abject failure, but I find it equally odd that so many people think that transformer-based LLMs can be incrementally improved to perfection. It seems pretty obvious to me now that we're not gonna RLHF our way out of hallucinations. We'll probably need a few more fundamental architecture breakthroughs to do that.
> Generative AI, as we know it, has only existed ~5-6 years, and it has improved substantially, and is likely to keep improving.
I think the big problem is that the pace of improvement was UNBELIEVABLE for about 4 years, and it appears to have plateaued to almost nothing.
ChatGPT has barely improved in, what, 6 months or so.
They are driving costs down incredibly, which is not nothing.
But, here's the thing, they're not cutting costs because they have to. Google has deep enough pockets.
They're cutting costs because - at least with the current known paradigm - the cost is not worth it to make material improvements.
So unless there's a paradigm shift, we're not seeing MASSIVE improvements in output like we did in the previous years.
You could see costs go down to 1/100th over 3 years, seriously.
But they need to make money, so it's possible non of that will be passed on.
I think that even if it never improves, its current state is already pretty useful. I do think it's going to improve though I don't think AGI is going to happen any time soon.
I have no idea what this is called, but it feels like a lot of people assume that progress will continue at a linear pace for forever for things, when I think that generally progress is closer to a "staircase" shape. A new invention or discovery will lead to a lot of really cool new inventions and discoveries in a very short period of time, eventually people will exhaust the low-to-middle-hanging fruit, and progress kind of levels out.
I suspect it will be the same way with AI; I don't now if we've reached the top of our current plateau, but if not I think we're getting fairly close.
3 replies →
They are focused on reducing costs in order to survive. Pure and simple.
Alphabet / Google doesn’t have that issue. OAI and other money losing firms do.
>and is likely to keep improving.
I'm not trying to be pedantic, but how did you arrive at 'keep improving' as a conclusion? Nobody is really sure how this stuff actually works. That's why AI safety was such a big deal a few years ago.
Totally reasonable question, and I only am making an assumption based on observed progress. AI generated code, at least in my personal experience, has gotten a lot better, and while I don't think that will go to infinity, I do think that there's still more room for improvement that could happen.
I will acknowledge that I don't have any evidence of this claim, so maybe the word "likely" was unwise, as that suggests probability. Feel free to replace "is "likely to" with "it feels like it will".
Because the likes of Altman have set short term expectations unrealistically high.
I mean that's every tech company.
I made a joke once after the first time I watched one of those Apple announcement shows in 2018, where I said "it's kind of sad, because there won't be any problems for us to solve because the iPhone XS Max is going to solve all of them".
The US economy is pretty much a big vibes-based Ponzi scheme now, so I don't think we can single-out AI, I think we have to blame the fact that the CEOs running these things face no negative consequences for lying or embellishing and they do get rewarded for it because it will often bump the stock price.
Is Tesla really worth more than every other car company combined in any kind of objective sense? I don't think so, I think people really like it when Elon lies to them about stuff that will come out "next year", and they feel no need to punish him economically.
2 replies →
I maintain that most anti-AI sentiment is actually anti-lying-tech-CEO sentiment misattributed.
The technology is neat, the people selling it are ghouls.
6 replies →
You're saying the same thing cryptobros say about bitcoin right now, and that's 17 years later.
It's a business, but it won't be the thing the first movers thought it was.
It’s different in that Bitcoin was never useful in any capacity when it was new. AI is at least useful right now and it’s improved considerably in the last few years.
1 reply →
A year ago I would have agreed wholeheartedly and I was a self confessed skeptic.
Then Gemini got good (around 2.5?), like I-turned-my-head good. I started to use it every week-ish, not to write code. But more like a tool (as you would a calculator).
More recently Opus 4.5 was released and now I'm using it every day to assist in code. It is regularly helping me take tasks that would have taken 6-12 hours down to 15-30 minutes with some minor prompting and hand holding.
I've not yet reached the point where I feel letting is loose and do the entire PR for me. But it's getting there.
> I was a self confessed skeptic.
I think that's the key. Healthy skepticism is always appropriate. It's the outright cynicism that gets me. "AI will never be able to [...]", when I've been sitting here at work doing 2/3rds of those supposedly impossible things. Flawlessly? No, of course not! But I don't do those things flawlessly on the first pass, either.
Skepticism is good. I have no time or patience for cynics who dismiss the whole technology as impossible.
I think the concern expressed as "impossible" is whether it can ever do those things "flawlessly" because that's what we actually need from its output. Otherwise a more experienced human is forced to do double work figuring out where it's wrong and then fixing it.
This is not a lofty goal. It's what we always expect from a competent human regardless of the number of passes it takes them. This is not what we get from LLMs in the same amount time it takes a human to do the work unassisted. If it's impossible then there is no amount of time that would ever get this result from this type of AI. This matters because it means the human is forced to still be in the loop, not saving time, and forced to work harder than just not using it.
I don't mean "flawless" in the sense that there cannot be improvements. I mean that the result should be what was expected for all possible inputs, and when inspected for bugs there are reasonable and subtle technical misunderstandings at the root of them (true bugs that are possibly undocumented or undefined behavior) and not a mess of additional linguistic ones or misuse. This is the stronger definition of what people mean by "hallucination", and it is absolutely not fixed and there has been no progress made on it either. No amount of prompting or prayer can work around it.
This game of AI whack-a-mole really is a waste of time in so many cases. I would not bet on statistical models being anything more than what they are.
I would strongly recommend this podcast episode with Andrej Karpathy. I will poorly summarize it by saying his main point is that AI will spread like any other technology. It’s not going to be a sudden flash and everything is done by AI. It will be a slow rollout where each year it automates more and more manual work, until one day we realize it’s everywhere and has become indispensable.
It sounds like what you are seeing lines up with his predictions. Each model generation is able to take on a little more of the responsibilities of a software engineer, but it’s not as if we suddenly don’t need the engineer anymore.
https://www.dwarkesh.com/p/andrej-karpathy
Though I think it's a very steep sigmoid that we're still far on the bottom half of.
For math it just did its first "almost independent" Erdos problem. In a couple months it'll probably do another, then maybe one each month for a while, then one morning we'll wake up and find whoom it solved 20 overnight and is spitting them out by the hour.
For software it's been "curiosity ... curiosity ... curiosity ... occasionally useful assistant ... slightly more capable assistant" up to now, and it'll probably continue like that for a while. The inflection point will be when OpenAI/Anthropic/Google releases an e2e platform meant to be driven primarily by the product team, with engineering just being co-drivers. It probably starts out buggy and needing a lot of hand-holding (and grumbling) from engineering, but slowly but surely becomes more independently capable. Then at some point, product will become more confident in that platform than their own engineering team, and begin pushing out features based on that alone. Once that process starts (probably first at OpenAI/Anthropic/Google themselves, but spreading like wildfire across the industry), then it's just a matter of time until leadership declares that all feature development goes through that platform, and retains only as many engineers as is required to support the platform itself.
10 replies →
AI first of all is not a technology.
Can people get their words straight before typing?
1 reply →
I'm now putting more queries into LLMs than I am into Google Search.
I'm not sure how much of that is because Google Search has worsened versus LLMs having improved, but it's still a substantial shift in my day-to-day life.
Something like finding the most appropriate sensor ICs to use for a particular use case requires so much less effort than it used to. I might have spent an entire day digging through data sheets before, and now I'll find what I need in a few minutes. It feels at least as revolutionary as when search replaced manually paging through web directories.
I feel like I'm living in a totally different world or I'm being gaslit by LLMs when I read stuff like this and other similar comments in this thread. Do you mind mentioning _what_ language / tech stack you're in? At my current job, we have a large Ruby on Rails codebase and just this week Gemini 2.5 and 3 struggled to even identify what classes inherited from another class.
This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
I work commercializing AI in some very specific use cases where it extremely valuable. Where people are being lead astray is layering generalizations: general use cases (copilots) deployed across general populations and generally not doing very well. But that's PMF stuff, not a failure of the underlying tech.
I think both sides of this debate are conflating the tech and the market. First of all, there were forms of "AI" before modern Gen AI (machine learning, NLP, computer vision, predictive algorithms, etc) that were and are very valuable for specific use cases. Not much has changed there AFAICT, so it's fair that the broader conversation about Gen AI is focused on general use cases deployed across general populations. After all, Microsoft thinks it's a copilot company, so it's fair to talk about how copilots are doing.
On the pro-AI side, people are conflating technology success with product success. Look at crypto -- the technology supports decentralization, anonymity, and use as a currency; but in the marketplace it is centralized, subject to KYC, and used for speculation instead of transactions. The potential of the tech does not always align with the way the world decides to use it.
On the other side of the aisle, people are conflating the problematic socio-economics of AI with the state of the technology. I think you're correct to call it a failure of PMF, and that's a problem worth writing articles about. It just shouldn't be so hard to talk about the success of the technology and its failure in the marketplace in the same breath.
> This feels like a pretty low effort post that plays heavily to superficial reader's cognitive biases.
I haven’t followed this author but the few times he’s come up his writings have been exactly this.
I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
[1]https://garymarcus.substack.com/p/dear-elon-musk-here-are-fi...
Which ones are you claiming have already been achieved?
My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
10 replies →
1 and 2 have been achieved.
4 is close, the interface needs some work to allow nontechnical people use it. (claude code)
9 replies →
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
Yes. I am a novelist and I noticed a step change in what was possible here around Claude Sonnet 3.7 in terms of being able to analyze my own unpublished work for theme, implicit motivations, subtext, etc -- without having any pre-digested analysis of the work in its training data.
2 replies →
Yes they can. The size of many codebases is much larger and LLMs can handle those.
Consider also that they can generate summaries and tackle the novel piecemeal, just like a human would.
Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
4 replies →
No human reads a novel and evaluates it as a whole. It's a story and the readers perception changes over the course of reading the book. Current AI can certainly do that.
1 reply →
>Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo)
Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
>reliably answer questions about plot, character, conflicts, motivations
LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
6 replies →
Which ones of those have been achieved in your opinion?
I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
Then, probably reading a novel and answering questions is the next most successful.
Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.
> Reliably constructing 10k bug free lines is probably the least successful.
You imperatively need to try Claude Code, because it absolutely does that.
1 reply →
I'm pretty sure it can do all of those except for the one which requires a physical body (in the kitchen) and the one that humans can't do reliably either (construct 10000 loc bug-free).
> Many of these have already been achieved, and it's only early 2026.
I'm quite sure people who made those (now laughable) predictions will tell you none of these has been achieved, because AI isn't doing this "reliably" or "bug-free."
Defending your predictions is like running an insurance company. You always win.
Besides being a cook which is more of a robotics problem all of the rest are accomplished to the point of being arguable about how reliably LLMs can perform these tasks, the arguments being between the enthusiast and naysayer camps.
The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
All but the robotics one are demonstrable in 2026 at least.
In my opinion, contrary to other comments here I think AI can do all of the above already except being a kitchen cook.
Just earlier today I asked it to give me a summary of a show I was watching until a particular episode in a particular season without spoiling the rest of it and it did a great job.
You know that almost every show as summaries of episodes available online?
1 reply →
This comment or something very close always appears alongside a Gary Marcus post.
And why not? Is there any reason for this comment to not appear?
If Bill Gates made a predication about computing, no matter what the predication says, you can bet that 640K memory quote would be mentioned in the comment section (even he didn't actually say that).
1 reply →
I think it’s for good reason. I’m a bit at a loss as to why every time this guy rages into the ether of his blog it’s considered newsworthy. Celebrity driven tech news is just so tiresome. Marcus was surpassed by others in the field and now he’s basically a professional heckler on a university payroll. I wish people could just be happy for the success of others instead of fuming about how so and so is a billionaire and they are not.
Which is fortunate, considering how asinine it is in 2026 to expect that none of the items listed will be accomplished in the next 3.9 years.
This post is literally just 4 screenshots of articles, not even its own commentary or discussion.
Don’t be too harsh, it’s the most effort Gary has put into his criticism in a while </s>
I appreciate good critique but this is not it
I’ve been using Claude Code, Gemini 3 Pro, and Nano Banana Pro to plan, code, and create custom UI elements for dozens of time-saving applications. For years, I have been searching high and low for existing solutions, but all I found were either overpriced cloud offerings that were bloated with endless features I didn’t need and just complicated the UI, or abandoned GitHub repos consisting of an initial commit and a roadmap that has been waiting eight years for its first update and what code was present was half baked and out of date. The reality is that my requirements are so specific to my workflow that until these latest models came along, building exactly what I needed in a matter of hours for a cost of $20 a month was inconceivable. Now I provide a description of what functionality I need, some sketches of the UI I made on my ipad with an apple pencil and after a bit of back and forth to get everything dialled in and I’ve created a bit of software that will save me dozens if not hundreds of hours of previously tedious manual work.
Gary Marcus (probably): "Hey this LLM isn't smarter than Einstein yet, it's not going all that well"
The goalposts keep getting pushed further and further every month. How many math and coding Olympiads and other benchmarks will LLMs need to dominate before people will actually admit that in some domains it's really quite good.
Sure, if you're a Nobel prize winner or PhD then LLMs aren't as good as you yet, but for 99% of the people in the world, LLMs are better than you at Math, Science, Coding, and every language probably except your native language, and it's probably better at you at that too...
Ignoring the actual poor quality of this write-up, I think we don't know how well GenAI is going to be honest. I feel we've not been able to properly measure or assess it's actual impact yet.
Even as I use it, and I use it everyday, I can't really assess its true impact. Am I more productive or less overall? I'm not too sure. Do I do higher quality work or lower quality work overall? I'm not too sure.
All I know, it's pretty cool, and using it is super easy. I probably use it too much, in a way, that it actually slows things down sometimes, when I use it for trivial things for example.
At least when it comes to productivity/quality I feel we don't really know yet.
But there are definite cool use-cases for it, I mean, I can edit photos/videos in ways I simply could not before, or generate a logo for a birthday party, I couldn't do that before. I can make a tune that I like, even if it's not the best song in the world, but it can have the lyrics I want. I can have it extract whatever from a PDF. I can have it tell me what to watch out for in a gigantic lease agreement I would not have bothered reading otherwise.
I can have it fix my tests, or write my tests, not sure if it saves me time, but I hate doing that, so it definitely makes it more fun and I can kind of just watch videos at the same time, what I couldn't before. Coding quality of life improvements are there too, I want to generate a sample JSON out of a JSONSchema, and so on. If I want, I can write the a method using English prompts instead of the code itself, might not truly be faster or not, not sure, but sometimes it's less mentally taxing, depending on my mood, it can be more fun or less fun, etc.
All those are pretty awesome wins and a sign that for sure those things will remain and I will happily pay for them. So maybe it depends on what you expected.
And what do you think investors in OAI et al are expecting?
You're absolutely right!
The irony of a five sentence article making giant claims isn't lost on me. Don't get me wrong: I'm amenable to the idea; but, y'know, my kids wrote longer essays in 4th grade.
All I know is that I have built more in the past 10 months than I ever have. How do you quantify for the skeptics the mental shift that happens when you know you can just build stuff now?
COULD I do this stuff before? Sure. But I wouldn’t have. Life gets in the way. Now, the bar is low so why not build stuff? Some of it ships, some of it is just experimentation. It’s all building.
Trying to quantify that shift is impossible. It’s not a multiplier to productivity you measure by commits. It’s a builder mind shift.
"I have built more in the past 10 months than I ever have."
Correction. The genAI has built it.
I haven't got any skin on either side here, but doesn't the fact the genAI can build it imply that what you are doing is heavily trodden ground, that there will be less and less need for developers like you, and will gradually lead to many developers (like you) being cut out of the market entirely.
For personal stuff it's wonderful. For work, it seems like a double edged sword that will eventually cut the devs that use it (and those that don't). Even if the business owners aren't completely daft and keep a (vastly diminished) workforce of dev/AI consultants on board, that could easily exclude you or me.
It's going well if all the jobs it eradicates can be replaced with just as many jobs (they can't), or the powers that be catch on and realise there isn't that many jobs left for humans to do and institute some form of basic income system (they won't).
"The genAI has built it" -- this is the core point. If I did nothing except complain about AI for the past 10 months, would these projects exist? No they would not. So. I. Built. It.
If you actually use these tools, really use them. You realize that it's an augmentation not a replacement. Simply because the training data is what has already come before (for now!). The LLMs need help, direction, focus...and those are learned skills dependent on the tooling. Not to mention ideas.
And sure, I imagine the software development workforce will change quite a bit, probably this year, no doubt about that.
But the need for builders will not change. I imagine that the 'builder' role will change to be traditional software developers, designers, sales people, writers, c-suite...whatever.
So I think you are right. "That could easily exclude you or me". 100% correct. The required skill set to be a builder is changing on a weekly basis. The only way to keep up is to keep building with these tools. Trying things. Experimenting. Otherwise, yes, you will probably be replaced. By a builder.
> For work, it seems like a double edged sword that will eventually cut the devs
Developers have been putting non-developers out of a job for decades.
Guessing this isn’t going to be popular here, but he’s right. AI has some use cases, but isn’t the world-changing paradigm shift it’s marketed as. It’s becoming clear the tech is ultimately just a tool, not a precursor to AGI.
Is that the claim the OP is making?
If AGI is ever going to happen, then it's definitionally a precursor to it.
So I'm not really sure how to parse your statement.
I’m not sure I follow. What if LLMs are helpful but not useful to AGI, but some other technology is? Seems likely.
1 reply →
not YET.
LLMs help me read code 10x faster - I’ll take the win and say thanks
Should have used an LLM to proofread.. LLMs can still cannot be trusted?
How dare you accuse Gary-Marcus-5.2-2025-12-11 of being an LLM??
How long do you think it will be until the “ai isn’t doing anything” people are going away 1 month, 6 months, I’d say 1 Year at the most, anyone who has used Claude code since Dec 1st knows this in their bones, so I’d just let these people shout from the top of the hill until they run out of steam…
Right around then, we can send a bunch of reconnaissance teams out to the abandoned Japanese islands to rescue them from the war that’s been over for 10 years - hopefully they can rejoin society, merge back with reality and get on with their lives
I think the "AI isn't doing anything" crowd have some kind of vocabulary/language barriers/deficiencies that prevent them from refining their prompting methods into something that works for them.
I find that the more precise I am in my prompts, the more precise the response. But that requires that I use vocabulary that I wouldn't use in a human conversation.
What a joke this guy is. I can sit down and crank out a real, complex feature in a couple hours that would have previously taken days and ship it to the users of our AI platform who can then respond to RFQs in minutes where they would have previously spent hours matching descriptions to part numbers manually.
...and yet we still see these articles claiming LLMs are dying/overhyped/major issues/whatever.
Cool man, I'll just be over here building my AI based business with AI and solving real problems in the very real manufacturing sector.
I'm in a similar situation.
Sometimes, I suspect that half the naysayers are just trolling for workable ideas; i.e., they want proof of someone's success so they can copy it.
Preaching to the wrong choir. The HN community is reaping massive benefits from generative AI.
I see stuff like this and think of these two things:
1) https://en.wikipedia.org/wiki/Gartner_hype_cycle
or
2) "First they ignore you, then they laugh at you, then they fight you, then you win."
or maybe originally:
"First they ignore you. Then they ridicule you. And then they attack you and want to burn you. And then they build monuments to you"
> "First they ignore you, then they laugh at you, then they fight you, then you win."
The only people who use this quote sincerely have been crackpots, in my experience.
First of all, popping in a few screenshots of articles and papers is not proper analysis.
Second of all, GenAI is going well or not depending on how we frame it.
In terms of saving time, money and effort when coding, writing, analysing, researching, etc. It’s extremely successful.
In terms of leading us to AGI… GenAI alone won’t reach that. Current ROI is plateauing, and we need to start investing more somewhere else.
I think that the wider industry is living right now what was coding and software engineering around 1 year or so ago.
Yeah you could ask ChatGPT or Claude to write code, but it wasn't really there.
It needs a while to adopt the model AND the UI. As in software are the first one because we are both makers and users.
I keep reading comments that claim GenAI's positive traits, but this usually amounts to some toy PoC that very eerily mirrors work found in code bootcamps. You want an app that has logins and comments and upvotes? GenAI is going to look amazing setting up a non-relational db to your node backend.
Aye. If you've not turned a real profit with your thing, I will default to believing that you don't know what you're talking about and are probably building toys.
It's nothing to do with AI. I didn't believe "I rewrote my application in three weeks!" claims before AI, and I don't believe them now. Most people are not able to evaluate themselves, I don't see why that would have changed.
It's more about how you use it. It should be a source of inspo. Not the end all be all.
Meanwhile $employer is continuing to migrate individual tasks to in-house AI tooling, and has licensed an off-the-shelf coding agent for all of us developers to put in our IDEs.
> LLMs can still cannot be trusted
But can they write grammatically correct statements?
This was the first thing I noticed too. This is the most low effort post I have ever seen that high up on hacker news
Meanwhile I'm over here reducing my ADO ticket time estimates by 75%.
Gary Marcus again. The chief doomer of AI where goal posts keep on moving.
Almost everyone around me, even the primary school kids use ChatGPT/Perplexity/Gemini/Claude in some form on almost a daily basis. The daily engagement is v strong.
The models keep improving every year. Nano banana gets text spot on, human anatomy of digits and toes is spot on. Deep Research mode is mind boggling. All the major vendors have some form of voice interaction, and it feels pretty good. I use perplexity talk feature while driving to learn deep about a topic of interest.
The trend is strong, betting against the trend isn't wise.
I can paste entire books and ask questions about certain pieces. The context windows nowadays are wild.
Price per token keeps on dropping, more capability keeps on coming online.
Gary offers no solutions, just complaints.
Odds this was AI generated?
It's literally just four screenshots paired with this sentence.
> Trying to orient our economy and geopolitical policy around such shoddy technology — particularly on the unproven hopes that it will dramatically improve– is a mistake.
The screenshots are screenshots of real articles. The sentence is shorter than a typical prompt.
I’m starting to think this take is legitimately insane.
As said in the article, a conservative estimate is that Gen AI can currently do 2.5% of all jobs in the entire economy. A technology that is really only a couple of years old. This is supposed to be _disappointing_? That’s millions of jobs _today_, in a totally nascent form.
I mean I understand skepticism, I’m not exactly in love with AI myself, but the world has literally been transformed.
Download models you can find now and forever. The guardrails will only get worse, or models banned entirely. Whether it's because of "hurts people's health" or some other moral panic, it will kill this tech off.
gpt-oss isn't bad, but even models you cannot run are worth getting since you may be able to run them in the future.
I'm hedging against models being so nerfed they are useless. (This is unlikely, but drives are cheap and data is expensive.)
Guardrails are just adjustable parameters. The trick is finding the right ones and turning them off.
I forget the name, but there is at least one project dedicated to this.
It's going well for coding. I just knocked out a mapping project that would have been a week+ of work (with docs and stackoverflow opened in the background) in a few hours.
And yes, I do understand the code and what is happening and did have to make a couple of adjustments manually.
I don't know that reducing coding work justifies the current valuations, but I wouldn't say it's "not going all that well".
How on Earth do people keep taking Gary Marcus seriously?
He's such a joke that even LLMs make fun of him. The Gemini-generated Hacker News frontpage for December 9 2035 contains an article by Gary Marcus: "AI progress is stalling": https://dosaygo-studio.github.io/hn-front-page-2035/news
As if the articles he’s linked were written by him
Haters gonna hate.
Holy moving goal posts batman!
I hate generative AI, but its inarguable what we have now would have been considered pure magic 5 years ago.
I've just started ignoring people like this. You think everything's going bad? Okay fine. You go ahead and keep believing that. Maybe you could get it printed on a sandwich board and walk up and down the street with it.
Huh?
Seems like black and white thinking to me. I had it make suggestions for 10 triage issues for my team today and agreed with all of its routings. That’s certainly better than 6 months ago.
a historic moron. Marcus will make Krugman's internet==fax machine look like a good prediction
This entire take is nonsense.
I just used ChatGPT to diagnose a very serious but ultimately not-dangerous health situation last week and it was perfect. It literally guided me perfectly without making me panic and helped me understand what was going on.
We use ChatGPT at work to do things that we have literally laid people off for, because we don't need them anymore. This included fixing bugs at a level that is at least E5/senior software engineer. Sometimes it does something really bad but it definitely saves times and helps avoid adding headcount.
Generative AI is years beyond what I would have expected even 1 year ago. This guy doesn't know what he's talking about, he's just picking and choosing one-off articles that make it seem like it's supporting his points.
[flagged]
All this AI discussion has done is reveal how naive some people are.
You're not losing your job unless you work on trivial codebases. There's a very clear pattern what those are: startups, greenfield, games, junk apps, mindless busywork that probably has an existing better tool on github, etc. Basically anything that doesn't have any concrete business requirements or legal liability.
This isn't to say those codebases will always be trivial, but good luck cleaning that up or facing the reality of having to rewrite it properly. At least you have AI to help with boilerplate. Maybe you'll learn to read docs along the way.
The people claiming to be significantly more productive are either novice programmers or optimistic for unexplained reasons they're still trying to figure out. When they want to let us know, most people still won't care because it's not even the good kind of unreasonable that brings innovation.
The only real value in modern LLMs is that natural language processing is a lot better than it used to be.
Are we done now?
I wholeheartedly agree. Shitty companies steal art and then put out shitty products that shitty people use to spam us with slop.
The same goes for code as well.
I’ve explored Claude code/antigravity/etc, found them mostly useless, tried a more interactive approach with copilot/local models/ tried less interactive “agents”/etc. it’s largely all slop.
My coworkers who claim they’re shipping at warp speed using generative AI are almost categorically our worst developers by a mile.
Ah, Gary Marcus, the 10x ninja whose hand-crafted bespoke code singlehandedly keeps his employer in business.
That’s not what I’m suggesting at all.