Comment by woodruffw

2 months ago

> why is the distributed artifact not cryptographically authenticated?

With what key? That’s the layer that “attestations” add on top, but with Trusted Publishing there’s no user/package—associated signature.

> Maybe I'm misunderstanding but I thought the whole point of the exercise was to avoid token compromise. Framed another way that means the goal is authentication of the CI/CD pipeline itself, right? Wouldn't signing a fingerprint be the default solution for that?

Yes, the goal is to authenticate the CI/CD pipeline (what we’d call a “machine identity”). And there is a signature involved, but it only verifies the identity of the pipeline, not the package being uploaded by that pipeline. That’s why we layer attestations on top.

(The reasons for this are unfortunately nuanced but ultimately boil down to it being hard to directly sign arbitrary inputs with just OIDC in a meaningful way. I have some slides from talks I gave in the past that might help clarify Trusted Publishing, the relationship with signatures/attestations, etc.[1][2])

> I suppose "authenticated publishing" could also work for the same reason.

I think this would imply that normal API token publishing is somehow not authenticated, which would be really confusing as well. It’s really not easy to come up with a name that doesn’t have some amount of overlap with existing concepts, unfortunately.

[1]: https://yossarian.net/res/pub/packagingcon-2023.pdf

[2]: https://yossarian.net/res/pub/scored-2023.pdf

5 comments

woodruffw

fc417fc802 2 months ago

> imply that normal API token publishing is somehow not authenticated

Fair enough, although the same reasoning would imply that API token publishing isn't trusted ... well after the recent npm attacks I suppose it might not be at that.

> With what key?

> And there is a signature involved,

So there's already a key involved. I realize its lifetime might not be suitable but presumably the pipeline itself either already possesses or could generate a long lived key to be registered with the central service.

> but it only verifies the identity of the pipeline,

I thought verifying the identity of the pipeline was the entire point? The pipeline singing a fingerprint of the package would enable anyone to verify the provenance of the complete contents (either they'd need a way to look up the key or you could do TOFU but I digress). There's value in being able to verify the integrity of the artifacts in your local cache.

Also, the more independent layers of authentication there are the fewer options an attacker will have. A hypothetical artifact that carried signatures from the developer, the pipeline, and the registry would have a very clear chain of custody.

> it being hard to directly sign arbitrary inputs with just OIDC in a meaningful way

At the end of the day you just need to somehow end up in a situation where the pipeline holds a key that has been authenticated by the package registry. From that point on I'd think that the particular signature scheme would become a trivial implementation detail; you stuff the output into some json or something similar and get on with life.

Has some key complexity gone over my head here?

BTW please don't take this the wrong way. It's not my intent to imply that I know better. As long as the process works it isn't my intent to critique it. I was just honestly surprised to learn that the package content itself isn't signed by the pipeline to prove provenance for downstream consumers and from there I'm just responding to the reasoning you gave. But if the current process does what it set out to do then I've no grounds to object.

woodruffw 2 months ago
> So there's already a key involved. I realize its lifetime might not be suitable but presumably the pipeline itself either already possesses or could generate a long lived key to be registered with the central service.
The key involved is the OIDC IdP's key, which isn't controlled by the maintainer of the project. I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package. This would mean that any GitHub Actions workflow anywhere would be one signing bug away from impersonating signatures for every PyPI project, which would be exceedingly not good. It would also make the insider risk from a compromised CI/CD provider much larger.
(Again, I really recommend taking a look at the talks I linked. Both Trusted Publishing and attestations were multi-year projects that involved multiple companies, cryptographers, and implementation engineers, and most of your - very reasonable! - questions came up for us as well while designing and planning this work.)
> I thought verifying the identity of the pipeline was the entire point? The pipeline singing a fingerprint of the package would enable anyone to verify the provenance of the complete contents (either they'd need a way to look up the key or you could do TOFU but I digress). There's value in being able to verify the integrity of the artifacts in your local cache.
There are two things here:
1. Trusted Publishing provides a verifiable link between a CI/CD provider (the "machine identity") and a packaging index. This verifiable link is used to issue short-lived, self-scoping credentials. Under the hood, Trusted Publishing relies on a signature from the CI/CD provider (which is an OIDC IdP) to verify that link, but that signature is only over a set of claims about the machine identity, not the package identity.
2. Attestations are a separate digital signing scheme that can use a machine identity. In PyPI's case, we bootstrap trust in a given machine identity by seeing if a project is already enrolled against a Trusted Publisher that matches that identity. But other packaging ecosystems may do other things; I don't know how NPM's attestations work, for example. This digital signing scheme uses a different key, one that's short-lived and isn't managed by the IdP, so that signing events can be made transparent (in the "transparency log" sense) and are associated more meaningfully with the machine identity, not the IdP that originally asserted the machine identity.
> At the end of the day you just need to somehow end up in a situation where the pipeline holds a key that has been authenticated by the package registry. From that point on I'd think that the particular signature scheme would become a trivial implementation detail; you stuff the output into some json or something similar and get on with life.
Yep, this is what attestations do. But a key piece of nuance: the pipeline doesn't "hold" a key per se, it generates a new short-lived key on each run and binds that key to the verified identity sourced from the IdP. This achieves the best of both worlds: users don't need to maintain a long-lived key, and the IdP itself is only trusted as an identity source (and is made auditable for issuance behavior via transparency logging). The end result is that clients that verify attestations don't verify using a specific key; the verify using an identity, and ensure that any particular key matches that identity as chained through an X.509 CA. That entire process is called Sigstore[1].
And no offense taken, these are good questions. It's a very complicated system!
[1]: https://www.sigstore.dev
- fc417fc802 2 months ago
  
  > I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package.
  There must be some misunderstanding. For trusted publishing a short lived API token is issued that can be used to upload the finished product. You could instead imagine negotiating a key (ephemeral or otherwise) and then verifying the signature on upload.
  Obviously the signing key can't be shared between projects any more than the API token is. I think I see where the misunderstanding arose now. Because I said "just verify the pipeline identity" and you interpreted that as "let end users get things signed by a single global provider key" or something to that effect, right?
  The only difference I had intended to communicate was the ability of the downstream consumer to verify the same claim (via signature) that the registry currently verifies via token. But it sounds like that's more or less what attestation is? (Hopefully I understood correctly.) But that leaves me wondering why Trusted Publishing exists at all. By the time you've done the OIDC dance why not just sign the package fingerprint and be done with it? ("We didn't feel like it" is of course a perfectly valid answer here. I'm just curious.)
  I did see that attestation has some other stuff about sigstore and countersignatures and etc. I'm not saying that additional stuff is bad, I'm asking if Trusted Publishing wouldn't be improved by offering a signature so that downstream could verify for itself. Was there some technical blocker to doing that?
  > the IdP itself is only trusted as an identity source
  "Only"? Doesn't being an identity source mean it can do pretty much anything if it goes rogue? (We "only" trust AD as an identity source.)
  
  2 replies →