NPM to implement staged publishing after turbulent shift off classic tokens

1 day ago (socket.dev)

In all of this, people forget that NPM packages are largely maintained by volunteers. If you are going to put up hurdles and give us extra jobs, you need to start paying us. Open source licenses explicitly state some variation of "use at your own risk". A big motivation for most maintainers is that we can create without being told what to do.

I had 25 million downloads on NPM last year. Not a huge amount compared to the big libs, but OTOH, people actually use my stuff. For this I have received exactly $0 (if they were Spotify or YouTube streams I would realistically be looking at ~$100,000).

I propose that we have two NPMs. A non-commercial NPM that is 100% use at your own risk, and a commerical NPM that has various guarantees that authors and maintainers are paid to uphold.

  • NPM has to decide between either being a friendly place for hobbyists to explore their passions or being the backbone for a significant slice of the IT industry.

    Every time someone pulls/messes with/uploads malware to NPM, people complain and blame NPM.

    Every time NPM takes steps to prevent pulling/messing with/uploading malware to NPM, people complain and blame NPM.

    I don't think splitting NPM will change that. Current NPM is already the "100% use at your own risk" NPM and still people complain when a piece of protestware breaks their build.

  • If you are going to put up hurdles and give us extra jobs, you need to start paying us.

    Alternatively, we can accept that there will be fewer libraries because some volunteers won't do the extra work for free. Arguably there are too many libraries already so maybe a contraction in the size of the ecosystem would be a net positive.

    • Note: the bad guys are incentivized to work for free, this would increase the problem considerably.

  • It's a bit more complicated than that. The ecosystem around node is just weird. It's not clear what role NPM wants to have.

    Lots of people chase downloads on NPM. It's their validation, their youtube subscribers, or their github stars if you will. That's how they get job offers. Or at least they think they do, I don't know if it actually works. There's tons of good software there, but the signal to noise ratio is still rather low.

    Given that, I'd rather get paid for including your software as a dependency to my software, boosting your downloads for a long time.

    Just kidding, of course. On that last part. But it wouldn't surprise me the least if something like it actually happened. After all, you can buy stars on github just like on any other social media. And that does strange things to the social dynamics.

  • I agree with you here, it feels like management said: "well, we have to do SOMETHING!" and this is what they chose: push more of the burden on to the developers giving away stuff for free when the burden should be on the developers and companies consuming the stuff for free.

    • But the management who decided that gets rewarded for pushing work to someone else.

  • Not looking forward to the mandatory doxxing that would probably come along if this was introduced today.

    • This makes no sense, maintainers are not exactly operating under a cloak of anonymity. Quite the opposite in fact.

  • Yes! I despise how the open source and free software culture turns into just free labour for freeloading million-dollar and billion-dollar companies.

    The culture made sense in the early days when it was a bunch of random nerds helping each other out and having fun. Now the freeloaders have managed to hijack it and inject themselves into it.

    They also weaponise the culture against the devs by shaming them for wanting money for their software.

    Many companies spend thousands of dollars every month on all sorts of things without much thought. But good luck getting a one-time $100 license fee out of them for some critical library that their whole product depends on.

    Personally I'd like to see the "give stuff to them for free then beg and pray for donations" culture end.

    We need to establish a balance based on the commercial value that is being provided.

    For example I want licensing to be based on the size and scale of the user (non-commercial user, tiny commercial user, small business, medium business, massive enterprise).

    It's absurd for a multi-million company to leech off a random dev for free.

    • I have no idea how much of this stuff is volunteer written, and how much is paid work that is open-sourced.

      No one if forced to use these licences. Even some FOSS licences such as AGPL will not be used by many companies (even the GPL where its software that is distributed to users). You could use a FOSS license and add an exemption for non-commercial use, or use a non-FOSS license that is free for non-commercial use or small businesses.

      On the other hand a lot of people choose permissive licenses. I assume they are happy to do so.

> In its current form, however, trusted publishing applies to a limited set of use cases. Support is restricted to a small number of CI providers, it cannot be used for the first publish of a new package, and it does not yet offer enforcement mechanisms such as mandatory 2FA at publish time. Those constraints have led maintainer groups to caution against treating trusted publishing as a universal upgrade, particularly for high-impact or critical packages.

This isn't strictly accurate: when we designed Trusted Publishing for PyPI, we designed it to be generic across OIDC IdPs (typically CI providers), and explicitly included an accommodation for creating new projects via Trusted Publishing (we called it "pending" publishers[1]). The latter is something that not all subsequent adopters of the Trusted Publishing technique have adopted, which is IMO both unfortunate and understandable (since it's a complication over the data model/assumptions around package existence).

I think a lot of the pains here are self-inflicted on GitHub's part here: deciding to remove normal API credentials entirely strikes me as extremely aggressive, and is completely unrelated to implementing Trusted Publishing. Combining the two together in the same campaign has made things unnecessarily confusing for users and integrators, it seems.

[1]: https://docs.pypi.org/trusted-publishers/creating-a-project-...

  • Yet in practice, only the big boys are allowed to become "Trusted Publishers":

    > In the interest of making the best use of PyPI's finite resources, we only plan to support platforms that have a reasonable level of usage among PyPI users for publishing. Additionally, we have high standards for overall reliability and security in the operation of a supported Identity Provider: in practice, this means that a home-grown or personal use IdP will not be eligible.

    How long until everyone is forced to launder their artifacts using Microsoft (TM) GitHub (R) to be "trusted"?

    [1] https://docs.pypi.org/trusted-publishers/internals/#how-do-i...

    • I wrote a good chunk of those docs, and I can assure you that the goal is always to add more identity providers, and not to enforce support for any particular provider. GitHub was only the first because it’s popular; there’s no grand evil theory beyond that.

      11 replies →

  • As I'm not familiar with the npm ecosystem so maybe I'm misunderstanding this but it sounds like they removed support for local publishes (via a token) in favor of CI publishing using Trusted Publishing.

    If that is correct, I thought this was discussed when Trusted Publishing was proposed for Rust that it was not meant to replace local publishing, only harden CI publishing.

    • > If that is correct, I thought this was discussed when Trusted Publishing was proposed for Rust that it was not meant to replace local publishing, only harden CI publishing.

      Yes, that's right, and that's how it was implemented for both Rust and Python. NPM seems to have decided to do their own thing here.

      (More precisely, I think NPM still allows local publishing with an API token, they just won't grant long-lived ones anymore.)

      21 replies →

  • > I think a lot of the pains here are self-inflicted on GitHub's part here

    It is spelled 'Microsoft'.

    What did you think would happen long term? I remember when that acquisition happened and there were parties thrown all around, MS finally 'got' open source.

    And never mind feeding all of the GitHub contents to their AI.

    • My point was that these are political and logistical problems latent to GitHub/Microsoft/whatever, not to Trusted Publishing as a design. I don't think I materially disagree with you about Microsoft not having a sterling reputation.

      2 replies →

  • "In its current form" is in context of NPM, where I think it's accurate.

    Great to see PyPi taking a more reasonable path.

I maintain some very highly used npm package and this situation just has me on edge. In our last release of dozens of packages, I was manually reading though our package-lock and package.json changes and reviewing every dependency change. Luckily our core libraries have no external dependencies, but our tooling has a ton.

We were left with a tough choice of moving to Trusted Publishers or allowing a few team members to publish locally with 2FA. We decided on Trusted Publishers because we've had an automated process with review steps for years, but we understand there's still a chance of a hack, so we're just extremely cautious with any PRs right now. Turning on Trusted Publishers was a huge pain with so many package.

The real thing we want for publishing is for us is to be able to continue to use our CI-based publishing setup, with Trusted Publishers, but with a human-in-the-loop 2FA step.

But that's only part of a complete solution. HITL is only guaranteed to slow down malicious code propagating. It doesn't actually protect our project against compromised dependencies, and doesn't really help prevent us from spreading it. All of that is still a manual responsibility of the humans. We need tools to lock down and analyze our dependencies better, and tools to analyze our our packages before publishing. I also want better tools for analyzing and sandboxing 3rd party PRs before running CI. Right now we have HITL there, but we have to manually investigate each PR before running tests.

The shift wouldn't have been so turbulent if npm had simply updated their CLI in tandem. I still can't use 2FA to publish because their CLI simply cannot handle it.

I really think that the main issue is that NPM itself will execute any script that is in the "postinstall" section of a package, without asking the user for permission. This is a solved problem in other package managers, e.g. PNPM will only run scripts if the user allows them to, and store the allowlist in the package.json file for future reference.

In this scenario, if a dependency were to add a "postinstall" script because it was compromised, it would not execute, and the user can review whether it should, greatly reducing the attack surface.

  • Wouldn't this just make the number of packages that can be targeted smaller? E.g. I publish a testrunner that needs to install Headless Chrome if not present via postinstall. People trust me and put the package on their allowlist. My account gets compromised and a malicious update is published. People execute malicious code they have never vetted.

    I do understand this is still better than npm right now, but it's still broken.

    • Security is usually full of incremental improvements like that, however. Reducing the scope from all of NPM to the handful of things like test runners would be an enormous benefit for auditors and would encourage consolidation (e.g. most testing frameworks could consolidate on a single headless chrome package), and in the future this could be further improved by things like restricting the scope of those scripts using the operating system sandbox features.

    • Security is layered, no layer will conclusively keep you safe, but each one make it harder to pierce to the core. For example, the impact of the recent SHA1-Hulud attack would be much less, as compromised packages (that previously did not have any scripts executing at install time), would not suddenly start executing, as they are not allowlisted.

  • There are a large subset of security problems that are solved by simply eliminating compilation steps typically included in "postinstall". If you want a more secure, more debuggable, more extensible lib, then you should definitely publish it in pure js (rather than, say, Typescript), so that there is no postinstall attack surface.

    • With type stripping in Node LTS now there's no reason at all to have a postinstall for Typescript code either. There's fewer reasons you can't post a "pure TS" library either.

The comment about maintainers not getting paid resonates. I'm a solo founder and these security changes, while necessary, add real friction to shipping.

The irony is that trusted publishing pushes everyone toward GitHub Actions, which centralizes risk. If GHA gets compromised, the blast radius is enormous. Meanwhile solo devs who publish from their local machine with 2FA are arguably more secure (smaller attack surface, human in the loop) but are being pushed toward automation.

What I'd like to see: a middle ground where trusted publishing works but requires a 2FA confirmation before the publish actually goes through. Keep the automation, keep the human gate. Best of both worlds.

The staged publishing approach mentioned in the article is a good step - at least you'd catch malicious code before it hits everyone's node_modules.

I really don't think this should be a registry-level issue. As in, the friction shouldn't be introduced into _publishing_ workflows, it should be introduced into _subscription_ workflows where there is an easy fix. Just stop supporting auto-update (through wildcard patch or minor versions) by default... Make the default behaviour to install whatever version you load at install time (like `npm ci` does)

Just to be clear, "trusted publishing" means a type of reverse vendor lock in? Only some CI systems are allowed to be used for it.

  • "Trusted Publishing" is just a term of art for OIDC. NPM can and should support federating with CI/CD platforms other than GitHub Actions, to avoid even the appearance of impropriety.

    (It makes sense that they'd target GHA first, since that's where the majority of their users probably are. But the technique itself is fundamentally platform agnostic and interoperable.)

I think this turbulent shift is going to push a lot of node devs elsewhere.

I understand things need to be safe, but this is a recklessly fast transition.

2FA publishing still doesn't work for me. just use legacy tokens at this point, gave up trying to figure out what's wrong

Seems like requiring 2FA to publish or trusted publishing should prevent the vast majority of this issue.

The only tricky bit would be to disallow approval own pull request when using trusted publishing. That should fall back to requiring 2FA

  • It also make it impossible to publish using CI, which is problematic for projects with frequent releases. And trusted publishing doesn't solve that if you use self-hosted CI.

    • > trusted publishing doesn't solve that if you use self-hosted CI

      Is there any particular reason for the whitelist approach? Standing on the sidelines it appears wholly unnecessary to me. Authentication that an artifact came from a given CI system seems orthogonal to the question of how much trust you place in a given CI system.

      4 replies →

I try to make this point when the subject comes up, but IMHO this is a lipstick-on-a-pig solution. The problem with npm security isn't stability or attestation, it's the impossibility of auditing all that garbage.

The software is too fine-grained. Too many (way too many) packages from small projects or obscure single authors doing way too many things that are being picked up for one trivial feature. That's just never going to work. If you don't know who's writing your software the answer will always end up being "Your Enemies" at some point.

And the solution is to stop the madness. Conglomerate the development. No more tiny things. Use big packages[1] from projects with recognized governance. Audit their releases and inclusion in the repository from a separate project with its own validation and testing. No more letting the bad guys push a button to publish.

Which is to say: this needs to be Debian. Or some other Linux distro. But really the best thing is for the JS community (PyPI and Cargo are dancing on the edge of madness too) to abandon its mistake and move everything into a bunch of Debian packages. Won't happen, but it's the solution nonetheless.

[1] c.f. the stuff done under the Apache banner, or C++ Boost, etc...

  • Agreed on the "this needs to be Debian" part. If some of the most popular JS packages were available through the system package manager as normal *.deb packages, I think people would be more likely to build on top of stable versions/releases.

    Stability (in the "it doesn't change" sense) is underrated.

  • The fact is that being Debian is boring, and JS (python/rust/...) is *cool*.

    Give it a few more decades, hopefully it'll be boring by then, the same way, say, making a house is boring.

    • > the same way, say, making a house is boring.

      The government will mostly ban it to keep prices high?

I've been thinking, Java doesn't have many supply chain issues, their model is based of namespacing with the DNS system. If I want a library from vendor.com, the library to import is somewhere under com.vendor.*

Simple enough, things like npm and pip reinvent a naming authority, have no cost associated (so it's weak to sybil attacks), all for not much, what do you get in exchange? You create equality by letting everyone contribute their wonderful packages, even those that don't have 15$/yr? I'm sorry was the previous leading internet mechanism not good and decentralized enough for you?

Java's package naming system is great in design, the biggest vuln in dependencies that I can think of on java was not a supply chain specific vuln, but rather a general weakness of a library (log4j). But maybe someone with more java experience can point to some disadvantage of the java system that explains why we are not all copying that

  • I think Java’s DNS namespacing is, at best, only a weak benefit to the supply chain security posture of Java packaging as a whole. I think it’s more that Java is (1) a batteries-included language, (2) lacks the same pervasive open source packaging culture that Python, Rust, JS, etc. have, (3) is much more conservative around dependency updates as a community, and (4) lacks a (well-known?) build time code execution vector similar to JS’s install scripts or Python’s setup.py.

    (Most of these are good things, to be clear!)

    • > lacks a (well-known?) build time code execution vector similar to JS’s install scripts or Python’s setup.py

      How is that leveraged by attackers in practice? Naively I would expect the actual issue to be insufficient sandboxing (network access in particular).

      8 replies →

    • Thanks for this insight-dense comment — and for all the efforts you have put into Trusted Publishing.

    • There being a compile/runtime difference at all seems quite impactful to dependency mgmt as a whole apparently, I've seen impacts in bc, build times and now security.

  • The primary way supply chain issues in Java are addressed is the very simple way: You don't have a large supply chain.

    You have one or two megalibraries that are like 20 years old and battle tested and haven't really changed in forever.

    Then you have a couple specific libraries for your very specific problem.

    Then, you pin those versions. You probably even run your own internal repo for artifacts so that you have full control over what code you pull in to your CI

    But none of this actually prevents supply chain attacks. What it does is drastically lower their profitability and success.

    Lets say you magically gain access to the Spring Boot framework's signing keys. You put out a malicious version that will drop persistent threats and backdoors everywhere it can and pulls out any credit card numbers or whatever it can find. The team behind Spring Boot takes like two weeks to figure it out, notify the breach, and take down the malicious code.

    How many actual systems have even pulled that code in? Very few. Even a significant supply chain attack still requires significant luck to breach targets. In NPM land, this is not the case, and tons of things are pulling in the "latest" version of frameworks. You are much more likely to get someone to actually run your malicious code.

Can we finally decare this (and other incomplete language specific package namanegers) to be a failed experiment and go back to robust and secure distro based package management workflow, with maintainers separate from upstream developpers ?

  • Its a false belief that distro based package management workflows are, or ever were, more secure. Its the same problem, maybe one step removed. Look at all the exploits with things like libxz

    There was also the python 2.7 problem for a long time, thanks to this model, it couldn't be updated quickly and developers, including the OS developers, became dependent on it being there by default, and built things around it.

    Then when it EOL'd, it left alot of people exposed to vulnerabilities and was quite the mess to update.

  • The robust and secure distro based package management workflow that shipped the libxz backdoor to everyone, and broke openssh key generation, and most of the functionality of keepassxc?

    • >workflow that shipped the libxz backdoor to everyone

      Isn't it the case that it didn't ship the backdoor? Precisely because of the thorough testing and vetting process?

      4 replies →

  • where do you get all these trusted people to review your dependencies from?

    it can't be anyone, because you're essentially delegating trust.

    no way there's enough trustworthy volunteers (and how do you vet them all?)

    and who's going to pay them if they're not volunteers?

  • Language-specific package managers are a natural outgrowth of wanting portability across platforms.

  • When distros figure out how I can test my software with a dep at version A and the same dep at version B in a straightforward way, then we can talk.

    NPM forcing a human to click a button on release would have solved a lot of this stuff. So would have many other mitigations.

  • I run them inside a sandbox.

    The npm community is too big that one can never discard it for frontend development.

  • Never in a million years.

    Rust's Cargo is sublime. System apt / yum / pacman / brew could never replace it.

    Cargo handles so much responsibility outside of system packages that they couldn't even come close to replicating the utility.

    Checking language versions, editions, compiling macros and sources, cross-compiling for foreign architectures, linking, handling upgrades, transient dependency versioning, handling conflicts, feature gating, optional compilation, custom linting and strictness, installing sidecars and cli utilities, etc. etc.

    Once it's hermetic and namespaced, cargo will be better than apt / yum / etc. They're not really performing the same tasks, but cargo is just so damned good at such important things that it's hard to imagine a better tool.

    • It's got all the same issues as npm though. The fact that it's so cool makes it a magnet for adding deps. Rust's own docs generator pulls in > 700 deps

Why is JS, in particular, so deeply afflicted with these issues? Why hasn’t there been an effort to create a more robust standard library? Or at least first party libraries maintained by the JavaScript team? That way folks can pull in trusted deps instead of all the hodgepodge.

Go did a lot wrong. It was just awful before they added Go modules. But it’s puzzling to me to understand why as a community and ecosystem its 3rd party dependencies seem so much less bloated. Part of it I think is because the standard library is pretty expansive. Part of it is because of things like golang.org/x. But there’s also a lot of corporate maintainers - and I feel like part of that is because packages are namespaces to the repository - which itself is namespaced to ownership. Technically that isn’t even a requirement - but the community adopted it pretty evenly - and it makes me wonder why others haven’t.

  • Mainly I think because of the scale it's used at, the things it's used for, and the people who use it.

    Technicalities are secondary to those factors.

    If Ruby was the only language that ran in the browser, you'd be writing the same rant about Ruby, no matter the stdlib.

  • Javascript is a standard with many implementations. Any addition to the "standard library" (such as it is) has to go through a long process to get approved by a committee, then in turn implemented by at least the major implementations (v8, SpiderMonkey, JavascriptKit).

    > Or at least first party libraries maintained by the JavaScript team?

    There is no "JavaScript team".