Comment by woodruffw

2 months ago

> So there's already a key involved. I realize its lifetime might not be suitable but presumably the pipeline itself either already possesses or could generate a long lived key to be registered with the central service.

The key involved is the OIDC IdP's key, which isn't controlled by the maintainer of the project. I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package. This would mean that any GitHub Actions workflow anywhere would be one signing bug away from impersonating signatures for every PyPI project, which would be exceedingly not good. It would also make the insider risk from a compromised CI/CD provider much larger.

(Again, I really recommend taking a look at the talks I linked. Both Trusted Publishing and attestations were multi-year projects that involved multiple companies, cryptographers, and implementation engineers, and most of your - very reasonable! - questions came up for us as well while designing and planning this work.)

> I thought verifying the identity of the pipeline was the entire point? The pipeline singing a fingerprint of the package would enable anyone to verify the provenance of the complete contents (either they'd need a way to look up the key or you could do TOFU but I digress). There's value in being able to verify the integrity of the artifacts in your local cache.

There are two things here:

1. Trusted Publishing provides a verifiable link between a CI/CD provider (the "machine identity") and a packaging index. This verifiable link is used to issue short-lived, self-scoping credentials. Under the hood, Trusted Publishing relies on a signature from the CI/CD provider (which is an OIDC IdP) to verify that link, but that signature is only over a set of claims about the machine identity, not the package identity.

2. Attestations are a separate digital signing scheme that can use a machine identity. In PyPI's case, we bootstrap trust in a given machine identity by seeing if a project is already enrolled against a Trusted Publisher that matches that identity. But other packaging ecosystems may do other things; I don't know how NPM's attestations work, for example. This digital signing scheme uses a different key, one that's short-lived and isn't managed by the IdP, so that signing events can be made transparent (in the "transparency log" sense) and are associated more meaningfully with the machine identity, not the IdP that originally asserted the machine identity.

> At the end of the day you just need to somehow end up in a situation where the pipeline holds a key that has been authenticated by the package registry. From that point on I'd think that the particular signature scheme would become a trivial implementation detail; you stuff the output into some json or something similar and get on with life.

Yep, this is what attestations do. But a key piece of nuance: the pipeline doesn't "hold" a key per se, it generates a new short-lived key on each run and binds that key to the verified identity sourced from the IdP. This achieves the best of both worlds: users don't need to maintain a long-lived key, and the IdP itself is only trusted as an identity source (and is made auditable for issuance behavior via transparency logging). The end result is that clients that verify attestations don't verify using a specific key; the verify using an identity, and ensure that any particular key matches that identity as chained through an X.509 CA. That entire process is called Sigstore[1].

And no offense taken, these are good questions. It's a very complicated system!

[1]: https://www.sigstore.dev

3 comments

woodruffw

fc417fc802 2 months ago

> I think it would be pretty risky to allow this key to directly sign for packages, because this would imply that any party that can use that key for signing can sign for any package.

There must be some misunderstanding. For trusted publishing a short lived API token is issued that can be used to upload the finished product. You could instead imagine negotiating a key (ephemeral or otherwise) and then verifying the signature on upload.

Obviously the signing key can't be shared between projects any more than the API token is. I think I see where the misunderstanding arose now. Because I said "just verify the pipeline identity" and you interpreted that as "let end users get things signed by a single global provider key" or something to that effect, right?

The only difference I had intended to communicate was the ability of the downstream consumer to verify the same claim (via signature) that the registry currently verifies via token. But it sounds like that's more or less what attestation is? (Hopefully I understood correctly.) But that leaves me wondering why Trusted Publishing exists at all. By the time you've done the OIDC dance why not just sign the package fingerprint and be done with it? ("We didn't feel like it" is of course a perfectly valid answer here. I'm just curious.)

I did see that attestation has some other stuff about sigstore and countersignatures and etc. I'm not saying that additional stuff is bad, I'm asking if Trusted Publishing wouldn't be improved by offering a signature so that downstream could verify for itself. Was there some technical blocker to doing that?

> the IdP itself is only trusted as an identity source

"Only"? Doesn't being an identity source mean it can do pretty much anything if it goes rogue? (We "only" trust AD as an identity source.)

woodruffw 2 months ago
> There must be some misunderstanding. For trusted publishing a short lived API token is issued that can be used to upload the finished product. You could instead imagine negotiating a key (ephemeral or otherwise) and then verifying the signature on upload.
From what authority? Where does that key come from, and why would a verifying party have any reason to trust it?
(I'm not trying to be tendentious, so sorry if it comes across that way. But I think you're asking good questions that lead to the design that we arrived at with attestations.)
> I did see that attestation has some other stuff about sigstore and countersignatures and etc. I'm not saying that additional stuff is bad, I'm asking if Trusted Publishing wouldn't be improved by offering a signature so that downstream could verify for itself. Was there some technical blocker to doing that?
The technical blocker is that there's no obvious way to create a user-originated key that's verifiably associated with a machine identity, as originally verified from the IdP's OIDC credential. You could do something like mash a digest into the audience claim, but this wouldn't be very auditable in practice (since there's no easy way to shoehorn transparency atop that). But some people have done some interesting exploration in that space with OpenPubKey[1], and maybe future changes to OIDC will make something like that more tractable.
> "Only"? Doesn't being an identity source mean it can do pretty much anything if it goes rogue? (We "only" trust AD as an identity source.)
Yes, but that's why PyPI (and everyone else who uses Sigstore) mediates its use of OIDC IdPs through a transparency logging mechanism. This is in effect similar to the situation with CAs on the web: a CA can always go rogue, but doing so would (1) be detectable in transparency logs, and (2) would get them immediately evicted from trust roots. If we observed rogue activity from GitHub's IdP in terms of identity issuance, the response would be similar.
[1]: https://github.com/openpubkey/openpubkey
- fc417fc802 2 months ago
  
  Okay. I see the lack of benefit now but regardless I'll go ahead and respond to clear up some points of misunderstanding (and because the topic is worthwhile I think).
  > From what authority?
  The registry. Same as the API token right now.
  > The technical blocker is that there's no obvious way to create a user-originated key
  I'm not entirely clear on your meaning of "user originated" there. Essentially I was thinking something equivalent to the security of - pipeline generates ephemeral key and signs { key digest, package name, artifact digest }, registry auth server signs the digest of that signature (this is what replaces the API token), registry bulk data server publishes this alongside the package artifact.
  But now I'm realizing that the only scenario where this offers additional benefit is in the event that the bulk data server for the registry is compromised but the auth server is not. I do think there's some value in that but the much simpler alternative is for the registry to tie all artifacts back to a single global key. So I guess the benefit is quite minimal. With both schemes downstream assumes that the registry auth server hasn't been compromised. So that's not great (but we already knew that).
  That said, you mention IdP transparency logging. Could you not add an arbitrary residue into the log entry? An auth server compromise would still be game over but at least that way any rogue package artifacts would conspicuously be missing a matching entry in the transparency log. But that would require the IdP service to do its own transparency logging as well ... yeah this is quickly adding complexity for only very minimal gain.
  Anyway. Hypothetical architectures aside, thanks for taking the time for the detailed explanations. Though it wasn't initially clear to me the rather minimal benefit is more than enough to explain why this general direction wasn't pursued.
  If anything I'm left feeling like maybe the ecosystems should all just switch directly to attested publishing.