← Back to context

Comment by toast0

5 years ago

If we make a simplified network map, FB looks more or less like a bunch of PoPs (points of presence) at major peering points around the world, a backbone network that connects those PoPs to the FB datacenters, and the FB operated datacenters themselves. (The datacenters are generally located a bit farther away from population centers, and therefore peering points, so it's sensible to communicate to the outside world through the PoPs only)

The DNS servers run at the PoPs, but only BGP advertise the DNS addresses when the PoP determines it's healthy. If there's no connectivity back to a FB datacenter (or perhaps, no connectivity to the preferred datacenter), the PoP is unhealthy and won't advertise the DNS addresses over BGP.

Since the BGP change that was pushed eliminated the backbone connectivity, none of the PoPs were able to connect to datacenters, and so they all, independently, stopped advertising the DNS addresses.

So that's why DNS went down. Of course, since client access goes through load balancers at the PoPs, and the PoPs couldn't access the datacenters where requests are actually processed, DNS being down wasn't a meaningful impediment to accessing the services. Apparently, it was an issue with management (among other issues).

Disclosure: I worked at WhatsApp until 2019, and saw some of the network diagrams. Network design may have changed a bit in the last 2 years, but probably not too much.

/* just observing the data presented in the Cloudflare article (https://blog.cloudflare.com/october-2021-facebook-outage/) and disagreeing with the conclusion :-)

While 129.134.30.0/23 (subnet where and a and b nameservers reside) has indeed been withdrawn (according to FB postmortem by DNS automation tooling, 129.134.0.0/17 that is the shorter prefix (perhaps summary at the edge) was still present, however, didn't have longer prefixes (e.g 129.134.30.0/24 and 129.134.31.0/24 we normally see anycasted externally) internally. In other words - routing towards FB DNS subnet (I haven't looked into 185.89.218.0/23 which is where 2 other authoritative nameservers reside) still worked up to the FB border, the traffic was dropped (routed to Null) by FB edge, since it didn't have more specifics internally.

This, combined with TTL of 60 seconds led to almost immediate global DNS failure and all other stuff you have been reading about.

  • That particular subnet has a covering prefix, but I don't think the other two DNS subnets do, and I had checked on the WhatsApp authoritative subnets, because I have greater affinity for WhatsApp. The WhatsApp subnets don't usually have a covering prefix (and I did check a looking glass during the outage and there were no announcements visible at least at that point).

    For those with a covering prefix, the diagnosis is a little bit different as you said, traffic would still flow to whichever FB PoPs advertise the covering prefix, but then it loops in FB, because the PoP doesn't know where to send it, since nowhere was advertising the specific /24. As opposed to the addresses with zero announcements, where the traffic doesn't make it to FB, but gets dropped somewhere else.

Ok so the DNS servers at PoPs, outside of backbone, did not go down.

Does it mean they can respond with public IPs meaningful for local PoP only and are not able to respond with IPs as directions to other PoPs or FB's main DCs? So that has to mean different public IPs are handed out at different PoPs, right?

  • I'm not quite sure I understand the question exactly, but let me give it a try.

    So, first off, each pop has a /24, so like the seattle-1 pop which is near me has 157.240.3.X addresses; for me, whatsapp.net currently resolves to 157.240.3.54 in the seattle-1 pop. these addresses are used as unicast meaning they go to one place only, and they're dedicated for seattle-1 (until FB moves them around). But there are also anycast /24s, like 69.171.250.0/24, where 69.171.250.60 is a loadbalancer IP that does the same job as 157.240.3.54, but multiple PoPs advertise 69.171.250.0/24; it's served from seattle-1 for me, but probably something else for you unless you're nearby.

    The DNS server IPs are also anycast, so if a PoP is healthy, it will BGP advertise the DNS server IPs (or at least some of them; if I ping {a-d}.ns.whatsapp.net, I see 4 different ping times, so I can tell seattle-1 is only advertising d.ns.whatsapp.net right now, and if I worked a little harder, I could probably figure out the other PoPs).

    Ok, so then I think your question is, if my DNS request for whatsapp.net makes it to the seattle1 PoP, will it only respond with a seattle-1 IP? That's one way to do it, but it's not necessarily the best way. Since my DNS requests could make it to any PoP, sending back an answer that points at that PoP may not be the best place to send me.

    Ideally, you want to send back an answer that is network local to the requester and also not a PoP that is overloaded. Every fancy DNS server does it a little different, but more or less you're integrating a bunch of information that links resolver IP to network location as well as capacity information and doing the best you can. Sometimes that would be sending users to anycast which should end up network local (but doesn't always), sometimes it's sending them to a specific pop you think is local, sometimes it's sending them to another pop because the usual best pop has some issue (overloaded on CPU, network congestion to the datacenters, network congestion on peering/transit, utility power issue, incoming weather event, fiber cut or upcoming fiber maintenance, etc).

    But in short, different DNS requests will get different answers. If you've got a few minutes, run these commands to see the range of answers you could get for the same query:

        host whatsapp.net # using your system resolver settings
        host whatsapp.net a.ns.whatsapp.net # direct to authoritative A
        host whatsapp.net b.ns.whatsapp.net # direct to B
        host whatsapp.net 8.8.8.8 # google public DNS
        host whatsapp.net 1.1.1.1 # cloudflare public DNS
        host whatsapp.net 4.2.2.1 # level 3 not entirely public DNS
        host whatsapp.net 208.67.222.222 # OpenDNS
        host whatsapp.net 9.9.9.9 # Quad9
    

    You should see a bunch of different addresses for the same service. FB hostnames do similar things of course.

    Adding on, the BGP announcments for the unicast /24s of the PoPs didn't go down during yesterday's outage. If you had any of the pop specific IPs for whatsapp.net, you could still use http://whatsapp.net (or https://whatsapp.net ), because the configuration for that hostname is so simple, it's served from the PoPs without going to the datacenters (it just sets some HSTS headers and redirects to www.whatsapp.com, which perhaps despite appearances is a page that is served from the datacenters and so would not have worked during the outage).

    •   Ok, so then I think your question is, if my DNS request for whatsapp.net makes it to the seattle1 PoP, will it only respond with a seattle-1 IP? That's one way to do it, but it's not necessarily the best way. Since my DNS requests could make it to any PoP, sending back an answer that points at that PoP may not be the best place to send me.
      
        Ideally, you want to send back an answer that is network local to the requester and also not a PoP that is overloaded. Every fancy DNS server does it a little different, but more or less you're integrating a bunch of information that links resolver IP to network location as well as capacity information and doing the best you can. Sometimes that would be sending users to anycast which should end up network local (but doesn't always), sometimes it's sending them to a specific pop you think is local, sometimes it's sending them to another pop because the usual best pop has some issue (overloaded on CPU, network congestion to the datacenters, network congestion on peering/transit, utility power issue, incoming weather event, fiber cut or upcoming fiber maintenance, etc).
      

      Right I was hoping the DNSs of FB ought to be smarter than usual and let's say when DNS at Seattle-1 cannot reach backbone it'd respond with IP of perhaps NYC/SF before it starts the BGP withdrawal.

      Thanks for the write up and I enjoy it.

      6 replies →