LetsEncrypt Outage

15 hours ago (letsencrypt.status.io)

Good time to note that Buypass offers free certificates over ACME. I have a few of my domains configured to use them instead of LetsEncrypt, just for redundancy and to ensure I have a working non-LE cert source in case LE suffers problems like this over a longer time period.

Example OpenBSD /etc/acme-client.conf:

  authority buypass {
   api url "https://api.buypass.com/acme/directory"
   account key "/etc/acme/buypass-privkey.pem"
   contact "mailto:youremail@example.com"
  }
  domain example.com {
   domain key "/etc/ssl/private/example.com.key"
   domain full chain certificate "/etc/ssl/example.com.pem"
   sign with buypass
  }

  • This is neat. Does cert-manager have facilities to automatically use a fallback ACME provider, so I could automate using this? I'd also accept a pool of ACME providers, but a priority ordering seems ideal. I don't see the functionality listed anywhere, maybe there's some security argument that this is a bad idea?

  • Cheers! They look like decent chaps and also outside the US for some additional certificate diversity. Are there other trustworthy Acme issuers out there?

    A pity that acme-client(1) does not allow for fallbacks, but I will add a mental note about it being an easy enough patch to contribute if I ever find the time.

Let's Encrypt stopped its certificate expiration email notification service a while ago, and I hadn't found a replacement yet. As a result, I didn't receive an expiration notice this time and failed to renew my certificate in advance. The certificate expired today, making my website inaccessible. I logged into my VPS to renew it manually, but the process failed every time. I then checked my cloud provider's platform and saw a notification at the top, which made me realize the problem was with the certificate provider. A quick look at Hacker News confirmed it: Let's Encrypt was having an outage. I want to post this news on my website, but I can't, because my site is down due to the expired certificate.

  • > Let's Encrypt stopped its certificate expiration email notification service a while ago, and I hadn't found a replacement yet.

    This sounds like an easy problem to identify root cause for.

    I think I received about 15 'we're disabling email notifications soon' emails over the past several months - one of which was interesting, but none were needed, as I'd originally set this up, per documentation, to auto-renew every 30 days.

    Perhaps create a calendar reminder for the short term?

  • Haven't they always, from day one, insisted that their primary goal was to encourage (force) automation of certificate maintenance, as a mechanism to make tls ubiquitous (mandatory everywhere)?

    • > Haven't they always, from day one, insisted that their primary goal was to encourage (force) automation of certificate maintenance, as a mechanism to make tls ubiquitous (mandatory everywhere)?

      And?

      Automation sometimes breaks, both for internal reasons (OS patching) or external. For the latter, LE at some point in the past changed CDNs, and this caused JWST headers to be sent back differently, which broke different clients, e.g.:

      * https://community.letsencrypt.org/t/jws-has-no-anti-replay-n...

      * https://github.com/dehydrated-io/dehydrated/issues/684

      Being able to get e-mails was an extra level of monitoring that was handy, even if you had automation.

    • Yes, we had lengthy discussions in itops (I had a admin role when LE was launched) about it.

      The team lead couldn't get over the slogan "devops, automating downtimes since 2010" whenever someone wanted to add a new nonessential automation that does things on prod servers.

      I mean he wasn't completely wrong, it was a non essential automation with high risk and very little reward (<1h saved every 2 yrs), which is why we never switched to LE for our main site, only internal tooling was allowed to use it

      2 replies →

  • Oof, you're right, that's rough that it's so soon after they discontinued their email service!

    I wrote this blog post a few weeks ago: "Minimal, cron-ready scripts in Bash, Python, Ruby, Node.js (JavaScript), Go, and Powershell to check when your website's SSL certificate expires." https://heiioncall.com/blog/barebone-scripts-to-check-ssl-ce... which may be helpful if you want to roll your own.

    (Disclosure: at Heii On-Call we also offer free SSL certificate expiration monitoring, among other things.)

  • Because you're not supposed to rely on emails. You should have an automated certificate renewal in place. I'm under the impression that Let's Encrypt wants to reduce certificate validity even further from the current 90 days.

  • > As a result, I didn't receive an expiration notice this time and failed to renew my certificate in advance.

    Shouldn't that happen automatically a bit beforehand?

    • Due to some legacy reasons, my service runs using a docker + nginx setup. However, certbot was initially used in its native nginx mode to generate the certificate, which prevented it from auto-renewing. I later switched it to standalone mode, but I'm not sure if I configured the auto-renewal correctly. In any case, the certificate happened to expire today, and it didn't renew automatically. On a side note, I was actually planning to see what an expired website certificate looked like first and then deal with the auto-renewal issue. After all, it's just a small hobby website, so it's not that big of a deal.

      1 reply →

  • If it's a personal website you should consider HTTP+HTTPS. It offers the best of both worlds and your website would always be accessible even if some third party CA is not (or if there's some local issue, or if the HTTP client connecting has cert issues). MITM attacks on personal websites are extremely, extremely rare.

It's DNS, we're working on it. Sorry, thank you for bearing with us.

  • It's not DNS

    There's no way it's DNS

    It was DNS

    • Five stages of DNS outage:

      1. Denial: It’s not DNS.

      2. Anger: What the fuck is it!

      3. Bargaining: Maybe it’s a firewall, or Cloudflare!

      4. Depression: We’ve checked everything…

      5. Acceptance: It’s DNS.

      1 reply →

  • It's always either DNS or MTU.

    (Or, as I recently encountered, it can also be a McAfee corporate firewall trying to be helpful by showing a download progress bar in place of an HTTP SSE stream. I was sure that was being caused by MTU, but alas no.)

  • whoa whoa whoa.. slow down! you dont just leap to "It's DNS"... you have to try to blame everything else first before you get to DNS. it's like foreplay!

    • when all of the interns have jumped around the corner before the blame hammer was wielded, you have to move to the next item on the list

Mostly this should be a non-event due to renewal long before expiration? Although huge deal I suppose for services that require issuing new certifications constantly; Let's Encrypt would be major failure mode for them.

As they move to shorter-lifetime certs (6 days now https://letsencrypt.org/2025/01/16/6-day-and-ip-certs/?utm_s...) this puts it in the realm of possibility that an incident could impact long-running services.

  • I encountered this while trying to issue a new certificate for a service. As a temporary fix, started using ZeroSSL which conveniently also supports the ACME protocol. While not a big problem, if you have something like `cert-manager` being used on Kubernetes, then it requires quite a bit of reconfiguration, and you may spend a couple hours trying to figure out why a certificate hasn't been issued yet.

    That said, I'm unbelievably grateful for the great product (and work!) LetsEncrypt has provided for free. Hope they're able to get their infrastructure back up soon.

  • From the announcement:

    Subscribers will be able to optin to short-lived certificates via a certificate profile mechanism being added to our ACME API.

    We hope to make short-lived certificates generally available by the end of 2025.

    The earliest short-lived certificates we issue may not support IP addresses, but we intend to enable IP address support by the time short-lived certificates reach general availability.

We're seeing a lot of downstream effects of this at StatusGator. Of course any provider that relies on LetsEncrypt to issue certs (such as Heroku) is affected.

One notable exception is Cloudflare: They famously no longer rely solely on LetsEncrypt.

Shall we have some way of freely encrypting the web that is relying on one authority?

Especially something that needed to be renewed every 90 or is it 40 days now. How about issuing 100 years certificates as a default?

  • Long expiration times = compromised certs that hang around longer than they should. It's bad.

    Note that you can make your own self-signed CA certificate, create any server and client certificates you want signed with that CA cert, and deploy them whenever and wherever you want. Of course you want the root CA private key securely put somewhere and all that stuff.

    The only reason it won't work at large without a bit of friction is because your CA cert isn't in the default trusted root store of major browsers (phone and PC). It's easy enough to add it - it does pop up warnings and such on Windows, Android, iOS and hopefully Mac OS X, but they're necessary here.

    No, it's not going to let the whole world do TLS with you warning-free without doing some sort of work, but for small scales (the type that Let's Encrypt is often used for anyway) it's fine.

  • Many of the cloud providers give free certs via acme.

    https://cloud.google.com/certificate-manager/docs/public-ca-... (EDIT: Google is their own CA, with https://pki.goog/ )

    The browsers and security people have been pushing towards shorter certs, not longer ones. Knowing how to rotate a cert every year, if not shorter, helps when your certificate or any of your parent certs are compromised and require an emergency rotation.

  • > Shall we have some way of freely encrypting the web that is relying on one authority?

    Caddy uses ZeroSSL as a fallback if Let’s Encrypt fails!

  • This is largely not an issue thanks to ACME which they spearheaded. You can use multiple providers as backup options.

    Also, you have days to weeks of slack time for renewals. The only real impact is trying to issue new certs if you are solely dependent on LE.

  • Revocation doesn't work well, so we're simplifying and relying on expiration for that. So no to the super long certs.

  • The bigger question that's going unasked: what the hell is the point of an expiration date if it keeps getting shorter? At some point we will refresh the cert every second.

    The whole point of the expiration is in case a hacker gets the private key to the cert and can then MITM, they can keep MITMing successfully until the cert the hacker gives to the clients expires (or was revoked by something like OCSP, assuming the client verifies OCSP). A very long expiration is very bad because it means the hacker could keep MITMing for years.

    The way things like this work with modern security is ephemeral security tokens. Your program starts and it requests a security token, and it refreshes the token over X time (within 24 hrs). If a hacker gets the token, they can attack using it until 1) you notice and revoke the existing tokens AND sessions, or 2) the token expires (and we assume they for some reason don't have an advanced persistent threat in place).

    Nobody puts any emphasis on the fact that 1) you have to NOTICE THE ATTACK AND REVOKE SHIT for any of these expirations to have any impact on security whatsoever, and 2) if they got the private key once, they can probably get it again after it expires, UNLESS YOU NOTICE AND PLUG THE HOLE. If you have nothing in place to notice a hacker has your private key, and if revocation isn't effective, the impact is exactly the same whether expiration is 1 second or 1 year.

    How many people are running security scans on their whole stack every day? How many are patching security holes within a week? How many have advanced software designed to find rootkits and other exploits? Or any other measure to detect active attacks? My guess is maybe 0.0001% of you do. So you will never know when they gain access to your certs, so the fast expiration is mostly pointless.

    We should be completely reinventing the whole protocol to be a token-based authorization service, because that's where it's headed. And we should be focusing more on mitigating active exploits rather than just hoping nobody ever exploits anything. But that would scare people, or require additional work. So instead we let like 3 companies slowly do whatever they want with the entire web in an uncoordinated way. And because we let them do whatever they want with the web, they keep introducing more failure modes and things get shittier. We are enabling the enshittification happening in front of our eyes.

    • The other benefit of expiration dates in a PKI is in case the subject information is no longer accurate.

      In old-school X.509 PKI this might be "in case this person is no longer affiliated with the issuer" (for organizational PKI) or "in case this contact information for this person is otherwise no longer accurate".

      In web PKI this might be "in case this person no longer controls this domain name" or "in case this person no longer controls this IP address".

      The key-compromise issue you mention was more urgent for the web PKI before TLS routinely used ciphersuites providing forward secrecy. In that case, a private key compromise would allow the attacker to passively decrypt all TLS sessions during the lifetime of that private key. With more modern ciphersuites, a private key compromise allows the attacker to actively impersonate an endpoint for future sessions during the lifetime of that private key. This is comparatively much less catastrophic.

      1 reply →

    • > The whole point of the expiration is in case a hacker gets the private key to the cert and can then MITM

      Nope. So all that happened here is that you were wrong.

  • You've always been able to do this. Whether its useful to your clients has always been the problem.

    In a practical sense you likely wouldn't like the alternatives, because for most people's usage of the internet there's exactly one authority which matters: the local government, and it's legal system - i.e. most of my necessary use of TLS is for ecommerce. Which means the ultimate authority is "are you a trusted business entity in the local jurisdiction?"

    Very few people would have any reason to ever expand the definition beyond this, and less would have the knowledge to do so safely even if we provided the interfaces - i.e. no one knows what safety numbers in Signal mean, if I can even get them to use Signal.

    • Maybe I'm misinterpreting this, but local government's legal system is not the "one authority which matters." What local government is able to keep up to date on TLS certificates?

      Your users that visit your website and get a TLS warning are the authority to worry about, if you're running a business that needs security. Depending on what you're selling, that one user could be a gigantic chunk of your business. Showing your local government that you have a process in place to renew your TLS certificates, and your provider was down is most likely going to be more than enough to indemnify you for any kind of maliciousness or ignorance (ignorantia juris non excusat). Obviously, different countries/locations have varying laws, but I highly doubt you'd be held liable for such a major outage for a company that is in such heavy use. Honestly, if you were held liable, or think you would be for this type of event, I'd think twice about operating from that location.

Hopefully the thundering herd when service is restored doesn't knock things offline again. I know LE designs for huge throughput (something like 3X total outstanding certificates in 24 hours, at one point) and the automated client recommendations for backoff are pretty good, but there will be a lot of manual applications/renewals I'm sure.

I want DANE!

  • That ship has sailed. DNSsec is not liked even a little bit. Given that control over DNS is how domain validated certs are handed out, it would make a lot of sense to cut out the middle man.

    But DNS does not have a good reliable authenticated transport mechanism. I wonder if there was a way to build this that would have worked.

    • My biggest problem is how centralized issuance is.

      Half the year I live on an island that is reliant on submarine cables and has historically had weeks and months long outages and with a changing world I suspect that might become reality once again. Locally this wasn't much of an issue, the ccTLD continues to function, most services (but now about 35%) are locally hosted. Then HTTPS comes along. Zero certificates could be (re-)issued during an outage. A locally run CA isn't really an option (standalone simply isn't feasible and getting into root stores takes time and money), so you are left with teaching users to ignore certificate errors a few weeks into an extended outage.

      I could see someone like LE working with TLD registrars to enable local issuance (with delegated/sub-CA certificates restricted to the TLD), that could also mitigate problems like today (decentralize issuance) and the registrars are already the primary source of truth for DV validation.

      3 replies →

Did the LLM delete this as well?

  • Either that or someone took Ambien last night, that seems to make people do crazy mistakes. ;)

  • The response he received had a correction to the code that the user did not expect.

  • apparently don't insult the golden goose of llms or the company that gives most of it's products away for free :P