Comment by pm2222

5 years ago

Ok so the DNS servers at PoPs, outside of backbone, did not go down.

Does it mean they can respond with public IPs meaningful for local PoP only and are not able to respond with IPs as directions to other PoPs or FB's main DCs? So that has to mean different public IPs are handed out at different PoPs, right?

8 comments

pm2222

toast0 5 years ago

I'm not quite sure I understand the question exactly, but let me give it a try.

So, first off, each pop has a /24, so like the seattle-1 pop which is near me has 157.240.3.X addresses; for me, whatsapp.net currently resolves to 157.240.3.54 in the seattle-1 pop. these addresses are used as unicast meaning they go to one place only, and they're dedicated for seattle-1 (until FB moves them around). But there are also anycast /24s, like 69.171.250.0/24, where 69.171.250.60 is a loadbalancer IP that does the same job as 157.240.3.54, but multiple PoPs advertise 69.171.250.0/24; it's served from seattle-1 for me, but probably something else for you unless you're nearby.

The DNS server IPs are also anycast, so if a PoP is healthy, it will BGP advertise the DNS server IPs (or at least some of them; if I ping {a-d}.ns.whatsapp.net, I see 4 different ping times, so I can tell seattle-1 is only advertising d.ns.whatsapp.net right now, and if I worked a little harder, I could probably figure out the other PoPs).

Ok, so then I think your question is, if my DNS request for whatsapp.net makes it to the seattle1 PoP, will it only respond with a seattle-1 IP? That's one way to do it, but it's not necessarily the best way. Since my DNS requests could make it to any PoP, sending back an answer that points at that PoP may not be the best place to send me.

Ideally, you want to send back an answer that is network local to the requester and also not a PoP that is overloaded. Every fancy DNS server does it a little different, but more or less you're integrating a bunch of information that links resolver IP to network location as well as capacity information and doing the best you can. Sometimes that would be sending users to anycast which should end up network local (but doesn't always), sometimes it's sending them to a specific pop you think is local, sometimes it's sending them to another pop because the usual best pop has some issue (overloaded on CPU, network congestion to the datacenters, network congestion on peering/transit, utility power issue, incoming weather event, fiber cut or upcoming fiber maintenance, etc).

But in short, different DNS requests will get different answers. If you've got a few minutes, run these commands to see the range of answers you could get for the same query:

    host whatsapp.net # using your system resolver settings
    host whatsapp.net a.ns.whatsapp.net # direct to authoritative A
    host whatsapp.net b.ns.whatsapp.net # direct to B
    host whatsapp.net 8.8.8.8 # google public DNS
    host whatsapp.net 1.1.1.1 # cloudflare public DNS
    host whatsapp.net 4.2.2.1 # level 3 not entirely public DNS
    host whatsapp.net 208.67.222.222 # OpenDNS
    host whatsapp.net 9.9.9.9 # Quad9

You should see a bunch of different addresses for the same service. FB hostnames do similar things of course.

Adding on, the BGP announcments for the unicast /24s of the PoPs didn't go down during yesterday's outage. If you had any of the pop specific IPs for whatsapp.net, you could still use http://whatsapp.net (or https://whatsapp.net ), because the configuration for that hostname is so simple, it's served from the PoPs without going to the datacenters (it just sets some HSTS headers and redirects to www.whatsapp.com, which perhaps despite appearances is a page that is served from the datacenters and so would not have worked during the outage).

pm2222 5 years ago
Ok, so then I think your question is, if my DNS request for whatsapp.net makes it to the seattle1 PoP, will it only respond with a seattle-1 IP? That's one way to do it, but it's not necessarily the best way. Since my DNS requests could make it to any PoP, sending back an answer that points at that PoP may not be the best place to send me. Ideally, you want to send back an answer that is network local to the requester and also not a PoP that is overloaded. Every fancy DNS server does it a little different, but more or less you're integrating a bunch of information that links resolver IP to network location as well as capacity information and doing the best you can. Sometimes that would be sending users to anycast which should end up network local (but doesn't always), sometimes it's sending them to a specific pop you think is local, sometimes it's sending them to another pop because the usual best pop has some issue (overloaded on CPU, network congestion to the datacenters, network congestion on peering/transit, utility power issue, incoming weather event, fiber cut or upcoming fiber maintenance, etc).
Right I was hoping the DNSs of FB ought to be smarter than usual and let's say when DNS at Seattle-1 cannot reach backbone it'd respond with IP of perhaps NYC/SF before it starts the BGP withdrawal.
Thanks for the write up and I enjoy it.
- toast0 5 years ago
  
  > Right I was hoping the DNSs of FB ought to be smarter than usual and let's say when DNS at Seattle-1 cannot reach backbone it'd respond with IP of perhaps NYC/SF before it starts the BGP withdrawal.
  The problem there is coordination. The PoPs don't generally communicate amongst themselves (and may not have been able to after the FB backbone was broken, although technically, they could have through transit connectivity, it may not be configured to work that way), so when a PoP loses its connection to the FB datacenters, it also loses its source of what PoPs are available and healthy. I think this is likely a classic distributed systems problem; the desired behavior when an individual node becomes unhealthy is different than when all nodes become unhealthy, but the nature of distributed systems is that a node can't tell if its the only unhealthy node or all nodes became unhealthy together. Each individual PoP did the right thing by dropping out of the anycast, but because they all did it, it was the wrong thing.
  
  5 replies →