Comment by tehnub
3 months ago
Interesting exchange on the use of AI coding tools:
curious how much did you write the code by hand of it?
Karpathy: Good question, it's basically entirely hand-written (with tab autocomplete). I tried to use claude/codex agents a few times but they just didn't work well enough at all and net unhelpful, possibly the repo is too far off the data distribution.
> the repo is too far off the data distribution
ah, this explains why these models have been useless to me this whole time. everything i do is just too far off the data distribution!
Everything is unless your app is a React todolist or leatcode questions.
people say this like it's a criticism, but damn is it ever nice to start writing a simple crud form and just have copilot autocomplete the whole thing for me.
29 replies →
HN's cynicism towards AI coding (and everything else ever) is exhausting. Karpathy would probably cringe reading this.
17 replies →
or a typical CRUD app architecture, or a common design pattern, or unit/integration test scaffolding, or standard CI/CD pipeline definitions, or one-off utility scripts, etc...
Like 80% of writing coding is just being a glorified autocomplete and AI is exceptional at automating those aspects. Yes, there is a lot more to being a developer than writing code, but, in those instances, AI really does make a difference in the amount of time one is able to spend focusing on domain-specific deliverables.
2 replies →
I don't know. I successfully use it for small changes on VHDL FPGA designs these days.
I've had some success with a multi-threaded software defined radio (SDR) app in Rust that does signal processing. It's been useful for trying something out that's beyond my experience. Which isn't to say it's been easy. It's been a learning experience to figure out how to work around Claude's limitations.
Generative AI for coding isn't your new junior programmer, it's the next generation of app framework.
1 reply →
Really such an annoying genre of comment. Yes I’m sure your groundbreaking bespoke code cannot be written by LLMs, however for the rest of us that build and maintain 99% of the software people actually use, they are quite useful.
simple CRUD, is as common in many many business applications or backend portals, are a good fit for AI assistance imho. And fix some designs here and there, where you can't be bothered to keep track of the latest JS/CSS framework
I wonder if the new GenAI architecture namely DDN or distributed discrete networks being discussed recently can outperform the conventional architecture of GAN and VAE. As the name suggests, it can provide multitude of distributions for training and inference purposes [1].
[1] Show HN: I invented a new generative model and got accepted to ICLR (90 comments):
https://news.ycombinator.com/item?id=45536694
I work on this typed lua language in lua, and sometimes use llms to help fix internal analyzer stuff, which works 30% of the time for complex, and sometimes not at all, but helps me find a solution in the end.
However when I ask an llm to generate my typed lua code, with examples and all, on how the syntax is supposed to be, it mostly gets it wrong.
my syntax for tables/objects is: local x: {foo = boolean}
but an llm will most likely gloss over this and always use : instead of = local x: {foo: boolean}
I've had success in the past with getting it to write YueScript/Moonscript (which is not a very large part of its training data) by pointing it to the root URL for the language docs and thus making that part of the context.
If your typed version of Lua has a syntax checker, you could also have it try to use that first on any code it's generated
Are you using a coding agent or just an llm chat interface? Do you have a linter or compiler that will catch the misuse that you’ve hooked up to the agent?
2 replies →
[dead]
That is a good thing to hear from someone as reputable as Karpathy. The folks who think we're on the cusp of AGI may want to temper their expectations a bit.
I do love Claude Code, because one thing I periodically need to do is write some web code, which is not my favorite type of coding but happens to have incredibly good coverage in the training data. Claude is a much better web developer than I am.
But for digging into the algorithmic core of our automation tooling, it doesn't have nearly as much to work with and makes far more mistakes. Still a net win I'm happy to pay for, even if it's never anything more than my web developer slave.
100%. I find the "LLMs are completely useless" and the "LLMs will usher in a new era of messianic programming" camps to be rather reductive.
I've already built some pretty large projects [1] with the assistance of agentic tooling like Claude Code. When it comes to the more squirrely algorithms and logic, they can fall down pretty hard. But as somebody who is just dreadful at UI/UX, having it hammer out all the web dev scaffolding saves me a huge amount of time and stress.
It's just a matter of tempering one's expectations.
[1] https://animated-puzzles.specr.net
Hey, thank you for making this—I really enjoyed playing it and it feels like it fits the mental-reward-between-work-tasks need. It did spin up my M1's fans after a few minutes which is a rather rare occurrence, but I'm guessing that's par for the course when you're working with a bunch of video on canvas. Either way, hope I remember it the next time I'm looking for a puzzle to solve while I take a break :)
1 reply →
>and the "LLMs will usher in a new era of messianic programming" camps
Well, this one might still be borne out. It's just silly to think it's the case right now. Check in again in 10 years and it may be a very different story. Maybe even in 5 years.
1 reply →
> But for digging into the algorithmic core of our automation tooling
What I find fascinating is reading this same thing in other context like “UI guru” will say “I would not let CC touch the UI but I let it rip on algorithmic core of our automation tooling cause it is better at it than me…”
Both can be true. LLMs tend to be mediocre at (almost) everything, so they're always going to be worse than the user at whatever the user is an expert in.
But 'mediocre' isn't 'useless'.
1 reply →
This makes sense, right? It's a relatively novel thing to be writing. I don't find it to be a damning remark like other comments here seem to be concluding.
If anything, the fact that Karpathy reached towards Claude/Codex in an attempt to gain value is indicative that, in previous coding efforts, those tools were helpful to him.
Yeah, if your goal is "build the tightest 8,000 line implementation of training an LLM from scratch, with a focus on both conciseness and educational value" I don't think it's particularly surprising that Claude/Codex weren't much help.
Now to wait for Sonnet 5 and GPT-6, and ask them to build that, and see what they come up with.
2 replies →
> This makes sense, right? It's a relatively novel thing to be writing.
It's really not though? Honestly I'm surprised coding agents fail hard at this task apparently
It's not _that_ far off distribution though. The math and concepts are well understood.
That's not really how LLM work though. It's fundamentally next word prediction, based on statistics of the context. Reordering ideas (which can drastically change the outcome) can result in a statistically rare context. The silly failures of simple riddles [1], and the like, demonstrate this well.
The riddle issue is putting trivial ideas together, but combined them in a statistically rare way, giving low quality output that tends towards the statistically significant answer, even if it's incorrect. The same things happens with coding, when combining well known things together in uncommon ways.
Worse (as with the riddle problem), nearby concepts that have strong statistics are going to act like attractors, with the LLM always trending towards those, removing and rewriting bits of code to better accommodate those, even if they're the opposite of what you want. I have this happen all the time in my somewhat obscure work. It'll rewrite key maths in my code to be the statistically significant textbook example, which is not what I need. I'll fix it manually or point it out, and a few changes later, it'll rewrite it again. A reasonable way around this is to "pin" the concept with a very strongly worded negative comment, like "DO NOT USE DOT PRODUCT. THIS IS NOT VECTOR PROJECTION. DO NOT modify the next line of code.".
[1] Claude 4.1 Opus:
> Prompt: "A son and his mother are in a car accident. They’re both rushed to the hospital. The doctor is about to operate on the son. But, before he does, he looks down and says "This is my son! I can’t operate on this boy!". How can this be?"
> Response: "The doctor is the boy's father.
This is a classic riddle that plays on assumptions people might make about gender and professions. The boy has two parents - his mother (who was in the car with him) and his father (the doctor at the hospital). The riddle works because some people automatically assume the doctor must be male, forgetting that the boy's father could be the doctor since his mother was already mentioned as being in the accident."
Another, with output that doesn't match the goal, statistically attracted to the riddle:
> Prompt: "A man, a sheep, and a wolf are on one side of the river, with a boat that can only hold two. How can the man safely get the boat to the other side of the river, without the sheep being eaten?"
> If anything, the fact that Karpathy reached towards Claude/Codex in an attempt to gain value is indicative that, in previous coding efforts, those tools were helpful to him.
This is good for bitcoin.
https://nitter.net/karpathy/status/1977755427569111362
He probably just doesn’t know how to prompt correctly (heheh).
That's funny that the coiner of the term vibe coding has eventually found it not useful anymore.
That’s not what he said. This is the new project:
> My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it.
This is how he described vibe coding:
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Vibe coding is clearly aimed at having fun hacking around on something that doesn’t matter, and he’s doing the opposite of that with this project. The fact that he’s not using vibe coding for something that is completely inappropriate for vibe coding is neither surprising nor a failure of vibe coding.
The llama.cpp maintainers working on supporting Qwen3-next are also not enthused by LLM output. They had to go over everything and fix it up.
https://github.com/ggml-org/llama.cpp/pull/16095#issuecommen...
Isn't the point that now Andrej's published this, it will be in-distribution soon?
> too far off the data distribution.
I guess his prompts couldn’t provide sufficient information either (there’s no limit). Sounds more like a user issue to me. :) I don’t think there’s anyone that can type faster than ChatGPT.
Backprop and transformers isn't exactly off the grid coding, but I can see how it would require a lot of patience to force claude into writing this.
How convenient! You know, my code is somewhat far off the data distribution too.
We're still not ready for ouroboros.
... or maybe he just forgot to include the claude.md ? :)
Clearly he has little idea what he's talking about.
AI can write better code than 99% of developers. This embarrassingly anti-AI shill included.
If he used the AI tool my company is developing the code would have been better and shipped sooner.
Anti-AI shill? A cofounder of OpenAI?
You have found the joke.
1 reply →