Comment by indutny

8 hours ago

Taking the question of whether this would be a useful addition to Node.js core or aside, it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter which in my opinion is against the spirit of the project and directly violates the terms of Developer's Certificate of Origin set in the project's CONTRIBUTING.md

61 comments

indutny

conartist6 3 hours ago

Pain is a signal. Even if the trick is not minding, it's still inadvisable to burn your hand on an open flame. The pain is there to help you not get hurt.

I do not think it is wise to brag that your solution to a problem is extremely painful but that you were impervious to all the pain. Others will still feel it. This code takes bandwidth to host and space on devices and for maintainers it permanently doubles the work associated with evolving the filesystem APIs. If someone else comes along with the same kind of thinking they might just double those doubled costs, and someone else might 8x them, all because nobody could feel the pain they were passing on to others

nine_k 2 hours ago

I don't see it to be such a pain.
> Bundle a full application into a Single Executable.
Embed a zip file into the executable, or something. Node sort of supports this since v25, see --build-sea. Bun and Deno support this for a longer time.
> Run tests without touching the disk.
This must be left to the host system to decide. Maybe I want them to touch the disk and leave traces useful for debugging. I'd go with tmpfile / tmpdir; whoever cares, knows to mount them as tmpfs, which sits in RAM. (Or a ramdisk under Windows.)
> Sandbox a tenant’s file access. In a multi-tenant platform, you need to confine each tenant to a directory without them escaping
This looks like a wrong tool, again. Run your Node app in a container (like you are already doing), mount every tenant's directory as a separate mount point into your container. (Similar with BSD jails.) This seems like the only problem that is not trivial to solve without a "VFS", but I'm not very certain that such a VFS would be as well-audited as Docker, or nsenter and unshare. The amount of work necessary for implementing that is too much for the niche benefit it would provide.
> Load code generated at runtime. See tmpfs for a trivial answer. For a less trivial answer, I don't see how Node's code loader is bound to a filesystem. If it can import via https, Just use ESM loader hooks and register() your loader, assuming you're running Node ≥ 20.6.

mixologic 7 hours ago

Worth noting that mcollina is a member of the Node.js Technical Steering Committee

kartaka83838 2 hours ago

yes this.
if there's anyone i would trust in exploring these avenues, it's him and the maintainers doing god's work in the nodejs repo in these past few years.
everlier 6 hours ago
We call it a slip slop at work, it's ok to slip some slop if it's "our" slop :-)
- giancarlostoro 4 hours ago
  
  > I pointed the AI at the tedious parts, the stuff that makes a 14k-line PR possible but no human wants to hand-write: implementing every fs method variant (sync, callback, promises), wiring up test coverage, and generating docs.
  Is it slop if it is carefully calculated? I tire of hearing people use slop to mean anything AI, even when it is carefully reviewed.
  
  4 replies →

digikata 7 hours ago

Large PRs could follow the practices that the Linux kernel dev lists follow. Sometimes large subsystem changes could be carried separately for a while by the submitter for testing and maintenance before being accepted in theory, reviewed, and if ready, then merged.

While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.

With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.

dakiol 6 hours ago
Nobody wants to review AI-generated code (unless we are paid for doing so). Open source is fun, that's why people do it for free... adding AI to the mix is just insulting to some, and boring to others.
Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.
- mpyne 3 minutes ago
  
  > Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write?
  If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?
- IgorPartola 2 hours ago
  
  In theory because the code being added is introducing a feature so compelling that it is worth it. In practice, that’s rarely the case.
  My personal approach to open source is more or less that when I need a piece of software to exist that does not and there is no good reason to keep it private, it becomes open source. I don’t do it for fun, I do it because I need it and might as well share it. If someone sends me a patch that enhances my use case, I will work with them to incorporate it. If they send me a patch that only benefits them it becomes a calculus of how much effort would it take for me to review it. If the effort is high, my advice is to fork the project or make it easier for me to review. Granted I don’t maintain huge or vital projects, but that’s precisely why: I don’t need yet another programming language or runtime to exist and I wouldn’t want to work on one for fun.
- hackemmy 1 hour ago
  
  I get the frustration but I think this take only holds if you assume AI generated code is inherently worse. If someone uses Claude to scaffold the boilerplate and then actually goes through it properly, the end result is the same code you would have written by hand, just faster. The real problem is when people submit 14k lines they clearly did not read through. But that is a review process problem, not an AI problem. Bad PRs existed long before AI.
  
  1 reply →
- tyre 4 hours ago
  
  Why do you care how much effort it took the engineer to make it? If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?
  Not everyone has the same motivations. I’ve done open source for fun, I’ve done it to unblock something at work, I’ve done it to fix something that annoys me.
  If your project is gaining useful functionality, that seems like a win.
  
  6 replies →
goalieca 7 hours ago
> With AI blowing up the line counts on PRs,
Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.
- digikata 7 hours ago
  
  Go slow to go fast. Breaking up the PR this way also allows later humans and AI alike to understand the codebase. Slowing down the PR process with standards lets the project move faster overall.
  If there is some bug that slips by review, having the PR broken down semantically allows quicker analysis and recovery later for one case. Even if you have AI reviewing new Node.js releases for if you want to take in the new version - the commit log will be more analyzable by the AI with semantic commits.
  Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.
- tracker1 6 hours ago
  
  TBF, most of the AI code I've reviewed isn't significantly different than code I've seen from people... in fact, I've seen significantly worse from real people.
  The fact is, it's useful as a tool, but you still should review what's going on/in. That isn't always easy though, and I get that. I've been working on a TS/JS driver for MS-SQL so I can use some features not in other libraries, mostly bridging a Rust driver (first Tiberious, then mssql-client), the clean abstraction made the switch pretty quick... a fairly thorough test suite for Deno/Node/Bun kapt the sanity in check. Rust C-style library with FFI access in TS/JS server environment.
  My hardest part, is actually having to setup a Windows Server to test the passswordless auth path (basically a connection string with integrated windows auth). I've got about 80 hours of real time into this project so far. And I'll probably be doing 2 followups.. one with be a generic ODBC adapter with a similar set of interfaces. And a final third adapter that will privide the same methods, but using the native SQLite underneath but smothing over the differences.
  I'm leveraging using/dispose (async) instead of explicit close/rollback patterns, similar to .Net as well as Dapper-like methods for "Typed" results, though no actual type validation... I'd considered trying to adapt Zod to check at least the first record or all records, and may still add the option.
  All said though, I wouldn't have been able to do so much with so relatively little time without the use of AI. You don't have to sacrifice quality to gain efficiency with AI, but you do need to take the time to do it.
- dotancohen 6 hours ago
  
  > Everyone talks about going faster and no one believes it is about higher quality.
  Go Fast And Break Things was considered a virtue in the JavaScript community long before LLMs became widely available.

madeofpalk 1 hour ago

> it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter

Who reviewed and approved the PR?

athorax 8 hours ago

How exactly does it violate the Developer's Certificate of Origin clause?

indutny 8 hours ago
The submitted code must adhere to either of (a), (b), (c), and separately a (d) clause of: https://github.com/nodejs/node/blob/main/CONTRIBUTING.md#dev...
If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
- athorax 5 hours ago
  
  Not sure if you are intentionally misrepresenting (a), but here is the full text
  (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
  
  2 replies →
- Dylan16807 5 hours ago
  
  If there's a "the original" the LLM is copying then there's a problem.
  If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.
- benatkin 5 hours ago
  
  To many, it qualifies under either A or B, and therefore C as well. Under A, you can think of the LLM as augmenting your own intelligence. Under B, the license terms of LLM output are essentially that you can do whatever you want with it. The alternative is avoiding use of AI because of copyright or plagiarism concerns.
- charcircuit 7 hours ago
  
  It would be considered (a) since the author would own the copyright on the code.
  
  7 replies →

epolanski 8 hours ago

Do as I say, not as I do.

On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.

indutny 8 hours ago
As someone who was a part of the aforementioned security team I'm not sure I'd be interested in reviewing such volume of machine generated code, expecting trap at every corner. The implicit assumption that I observed at many OSS projects I've been involved with is that first time contributions are rarely accepted if they are too large in volume, and "core contributor" designation exists to signal "I put effort into this code, stand by it, and respect everyone's time in reviewing it". The PR in the post violates this social contract.
- epolanski 7 hours ago
  
  For free, you can decide to do what you want, if it's your job, it's a bit different and you may have to do so, especially considering Collina, is one of the largest contributors of the project and member of the technical committee.
  
  4 replies →
- lemagedurage 8 hours ago
  
  [dead]

syrusakbary 5 hours ago

Fully disagree with this take. Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.

Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:

  I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented.

[1]: https://github.com/nodejs/node/pull/61478#issuecomment-40772...

indutny 5 hours ago

I appreciate hearing your point of view on this. In my opinion the future of Open Source and AI assisted coding is a much bigger issue, and different people have different levels of confidence in both positive and negative outcomes of LLM impact on our industry.
It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.
KronisLV 17 minutes ago

> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
That sort of statement might also be sarcasm in another context: I personally use AI a lot, but also recognize that there are a lot of projects out there that are suffering from low quality slop pull requests, devs that kinda sign out and don't care much about the actual code as long as it appears to be running, alongside most LLMs struggling a lot with longer term maintenance if not carefully managed. So I guess it depends a lot on how AI is used and how much ideological opposition to that there is. In a really testable codebase it could actually work out pretty well, though.
jaredklewis 3 hours ago
AI coding is great, but iteration speed is absolutely not a desirable trait for a runtime. Stability is everything.
Speed code all your SaaS apps, but slow iteration speeds are better for a runtime because once you add something, you can basically never remove it. You can't iterate. You get literally one shot, and if you add a awkward or trappy API, everyone is now stuck with it forever. And what if this "must have" feature turns out to be kind of a dud, because everyone converged on a much more elegant solution a few years later? Congratulations, we now have to maintain this legacy feature forever and everyone has to migrate their codebase to some new solution.
Much better to let dependencies and competing platforms like bun or deno do all the innovating. Once everyone has tried and refined all the different ways of solving this particular problem, and all the kinks have been worked out, and all the different ways to structure the API have been tried, you can take just the best of the best ideas and add it into the runtime. It was late, but because of that it will be stable and not a train wreck.
But I know what you're thinking. "You can't do that. Just look at what happens to platforms that iterate slowly, like C or C++ or Java. They're toast." Oh wait, never mind, they're among the most popular platforms out there.
- syrusakbary 3 hours ago
  
  Since when we accepted that we can’t go fast and offer stability at the same time?
  Time is highly correlated with expertise. When you don’t have expertise, you may go fast at expense of stability because you lack the experience to make good decisions to really save speed. This doesn’t hold true for any projects where you rely on experts, good processes and tight timelines (aka: Apollo mission)
  
  2 replies →
oystersareyum 3 hours ago
Allowing AI contributions results in lower quality contributions and allows wild things to come in and disrupt it, making it an unreliable dependency. We have seen big tech experience constant outages due to AI contributions as is...
- UqWBcuFx6NV4r 1 hour ago
  
  Your comment is why advertisers say that you should repeat your core call to action at least a few times to make it stick.
  You’ve read people saying the same thing hundreds of times and have somehow taken that as meaning that it’s credible.
  Neither you nor I nor anyone else here knows what the “effects” are, because this is brand new tech, and it’s constantly changing. Yet you’re speaking with absolute confidence.
  “Big tech” has downtime all the time, and LLMs did not change that fact. The only difference is that the peanut gallery that is already worked up about AI for philosophical / cultural reasons is suddenly ready to blame AI for every issue under the sun.
  You think that you’re making a technical argument but you’re just repeating the same taking points I see teenagers regurgitating on TikTok. There’s nothing intelligent or credible about it.
szmarczak 5 hours ago

> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.
Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.
It's like comparing C to C++. Those languages are from different eras (relatively to each other).