Comment by seiferteric
14 hours ago
Now that I have seemingly taken on managing DNS at my current company I have seen several inadequacies of DNS that I was not aware of before. Main one being that if an upstream DNS server returns SERVFAIL, there is no distinction really between if the server you are querying is failed, or the actual authoritative server upstream is broken (I am aware of EDEs but doesn't really solve this). So clients querying a broken domain will retry each of their configured DNS servers, and our caching layer (Unbound) will also retry each of their upstreams etc... Results in a bunch of pointless upstream queries like an amplification attack. Also have issue with the search path doing stupid queries with NXDOMAIN like badname.company.com, badname.company.othername.com... etc..
re: your SERVFAIL observation, oh man did I run into this exact issue about a year or so ago when this came up for a particular zone. all I was doing was troubleshooting it on the caching server. Took me a day or two to actually look at the auth server and find out that the issue actually rooted from there.
> So clients querying a broken domain will retry each of their configured DNS servers, our caching layer (Unbound) will also retry each of their upstreams etc...
I expect this is why BIND 9 has the 'servfail-ttl' option. [0]
Turns out that there's a standards-track RFC from 1998 that explicitly permits caching SERVFAIL responses. [1] Section 8 of that document suggests that this behavior was permitted by RFC 1034 (published back in 1987).
[0] <https://bind9.readthedocs.io/en/v9.18.42/reference.html#name...>
[1] <https://www.rfc-editor.org/rfc/rfc2308#section-7.1>