← Back to context

Comment by amluto

7 days ago

> The researchers identified four fundamental security properties that CI/CD systems need: admittance control, execution control, code control, and access to secrets.

Why do CI/CD systems need access to secrets? I would argue need access to APIs and they need privileges to perform specific API calls. But there is absolutely nothing about calling an API that fundamentally requires that the caller know a secret.

I would argue that a good CI/CD system should not support secrets as a first-class object at all. Instead steps may have privileges assigned. At most there should be an adapter, secure enclave style, that may hold a secret and give CI/CD steps the ability to do something with that secret, to be used for APIs that don’t support OIDC or some other mechanism to avoid secrets entirely.

> I would argue that a good CI/CD system should not support secrets as a first-class object at all. Instead steps may have privileges assigned. At most there should be an adapter, secure enclave style, that may hold a secret and give CI/CD steps the ability to do something with that secret, to be used for APIs that don’t support OIDC or some other mechanism to avoid secrets entirely.

This all seems right, but the reality is that people will put secrets into CI/CD, and so the platform should provide an at least passably secure mechanism for them.

(A key example being open source: people want to publish from CI, and they’re not going to set up additional infrastructure when the point of using third-party CI is to avoid that setup.)

  • Fine. Then let people set up a little WASM app that gets access to the secret and has an API callable by the workflow. And make that app be configured as part of the secret’s configuration, not as a file in the repository.

“Good CI systems shouldn’t support secrets, at most there should be [the most complicated secret support ever]”

Let’s just call it secret support.

I agree with your suggestion that capabilities-based APIs are better, but CI/CD needs to meet customers where they’re at currently, not where they should be. Most customers need secrets.

We use proprietary tools (QNX compiler, Coverity static analysis, ...) and those require access to a license server which requires some secret.

I don't really understand what you mean by "secure enclave style"? How would that be different?

  • With a secure enclave or an HSM, there's a secret, but the users do not have access to the secret. So, if you have a workflow that needs to, say, sign with a given private key, you would get an API that signs for you. If you need to open a TLS connection with a client certificate, you get a proxy that authenticates for you.

    I suppose I would make an exception for license keys. Those have minimal blast radii if they leak.

    • And how is that exposed to the CI/CD? An environment variable? Some bespoke tool that the CI runs to read the secret from the Secure Enclave?

      Your approach boils down to “lets give each step its own access to its own hardware-protected secrets, but developers shouldn’t otherwise have access”

      Which is a great way to “support secrets,” just like the article says.

> I would argue that a good CI/CD system should not support secrets as a first-class object at all. Instead steps may have privileges assigned. At most there should be an adapter, secure enclave style, that may hold a secret and give CI/CD steps the ability to do something with that secret, to be used for APIs that don’t support OIDC or some other mechanism to avoid secrets entirely.

CI/CD does not exist in the vacuum. If you had CI/CD entirely integrated with the rest of the infrastructure it might be possible to do say an app deploy without passing creds to user code (say have the platform APIs that it can call to do the deployment instead of typical "install the client, get the creds, run k8s/ssh/whatever else needed for deploy").

But that's a high level of integration that's very environment specific, and without all that many positives (so what you don't need creds, you still have permission to do a lot of mess if it gets hijacked), and a lot, lot more code to write vs "run a container and pass it some env vars" that had become a standard

  • You seem to be talking mostly about the CD part. Some thoughts:

    On the one hand, CD workflows are less exposed than CI workflows. You only deploy code that has made it through your review and CI processes. In a non-continuous deployment model, you only deploy code when you decide to. You are not running your CD workflow on a third-party pull request.

    On the other hand, the actual CD permission is a big deal. If you leak a credential that can deploy to your k8s cluster, you are very, very pwned. Possibly in a manner that is extremely complex to recover from.

    I also admit that I find it rather surprising that so many workflows have a push model of deployment like this. My intuition for how to design a CD-style system would be:

    1. A release is tagged in source control.

    2. Something consumes that release tag and produces a production artifact. This might be some sort of runner that checks out the tagged release, builds it, and produces a ghcr image. Bonus points if that process is cleanly reproducible and more bonus points if there's also an attestation that the release artifact matches the specified tag and all the build environment inputs. (I think that GitHub Actions can do this, other than the bonus points, without any secrets.)

    3. Something tells production to update to the new artifact. Ideally this would trigger some kind of staged deployment. Maybe it's continuous, maybe it needs manual triggering. I think that, in many production systems, this could be a message from the earlier stages that tells an agent with production privileges to download and update. It really shouldn't be that hard to make a little agent in k8s or whatever that listens to an API call from a system like GitHub Actions, authenticates it using OIDC, and follows its deployment instructions.

    P.S. An attested-reproducible CD build system might be an interesting startup idea.

    • Well, in my mind the build system should build an artifact (a container, or a .deb package), and then the separate system should deploy it (with smaller amount of permitted people), and have option to roll it back. So in principle I agree on that .

      ...but I saw that anti-pattern of "just add a step that does the deploy after CI in same" often enough that I think it might be the most common way to do it.

  • CI shouldn't do deployments, deployment pipelines should run separately when a new release passes CI

    Of course the general purpose task runner that both run on does need to support secrets

    • Hmm, I have long assumed that a perfectly executed CI/CD setup would be based on a generic task runner... But maybe not?

      Only the CI part needs to build; it needs little else and it's the only part of a coherent setup that needs to build.

    • We're iterating towards GHA for CI, AWS CodeBuild for the CD. At least on AWS projects. Mainly because managing IAM permissions to permit the github runner to do everything the deployment wants is an astonishingly large waste of time. But you need a secret to trigger one from the other.

      6 replies →

How do you e.g. validate that a database product works with all the different cloud databases? Every time you change up SQL generation you're going to want to make sure the SQL parses and evaluates as expected on all supported platforms.

Those tests will need creds to access third party database endpoints.

You're missing that the D in CI/CD means deployment; be that packaging on pushing tags and publishing to a registry, or building images, or packaging github releases.

CI is arguable, but how do you intend to do deployments with no secrets?

  • AWS is great for this. IAM policies can allow IP Addresses or more safely just named EC2 instances. Our deploy server requires nothing.

    • CircleCI and I believe GHA support injecting signed JWTs you can use to bootstrap identity be it an IAM role or some other platform where you can trust an OIDC issuer

      2 replies →

  • The secret is held by the metadata server that the CI instance has access to

    Or: the deployment service knows the identity of the instance, so its secret is its private key

    Or, how PyPI does it: the deployment service coordinates with the trusted CI/CD service to learn the identity of the machine (like its IP address, or a trusted assertion of which repository it’s running on), so the secret is handled in however that out-of-band verification step happens. (PyPI communicates with Github Actions about which pipeline from which repository is doing the deployment, for example)

    It’s still just secrets all the way down

    • > The secret is held by the metadata server that the CI instance has access to

      But how does the metadata server know that the CI instance is allowed to access the secret? Especially when the CI/CD system is hosted at a 3rd. party. It needs to present some form of credentials. The CI system may also need permission or credentials for a private repository of packages or artifacts needed in the build process.

      For me, a CI/CD system needs two things: Secret management and the ability to run Bash.

      3 replies →

> But there is absolutely nothing about calling an API that fundamentally requires that the caller know a secret.

There is if you pay for API access, surely?

While good in theory, in practice secrets are used to validate those privileges have been assigned. Even in schemes like metadata servers, you still use a secret.

Pedantically I'd say maybe it's more fair to say they shouldn't have access to long lived secrets and should only use short lived values.

The "I" stands for Integration so it's inevitable CI needs to talk to multiple things--at the very least a git repo which most cases requires a secret to pull.

You might want (or _need_) to sign your binary, for example. Or you might want to trigger a deployment.

Github actually is doing something right here. You can set it up as a trusted identity provider in AWS, and then use Github to assume a role in your AWS account. And from there, you can get access to credentials stored in Secret Manager or SSM.

  • Yes, their oidc setup was probably their last good feature back when they were actually delivering features back in 2020ish. Everyone else copied it within a few months though.

  • Yeah I sign my project APKs so people can install them from the action's artefact

      - name: Retrieve keystore for apk signing
        env:
          KEYSTORE: ${{ secrets.KEYSTORE }}
          run: echo "$KEYSTORE" | base64 --decode > /home/runner/work/keystore.pfk

    • Exactly. This workflow step takes a rather important secret and sticks it on a VM where any insufficiently sandboxed step before or after it can exfiltrate it.

      GitHub should instead let you store that key as a different type of secret such that a specific workflow step can sign with it. Then a compromised runner VM could possibly sign something that shouldn’t be signed but could not exfiltrate it.

      Even better would be to be able to have a policy that the only thing that can be signed is something with a version that matches the immutable release that’s being built.

> Why do CI/CD systems need access to secrets?

Because you need to be able to sign/notarize with private keys and deploy to cloud environments. Both of these require secrets known to the runner.

Because for some reason they use the same system to do releases and sign them and publish them.