Comment by mdavid626
20 days ago
I would expect, that dns servers like 1.1.1.1 at this scale have integration tests running real resolvers, like the one in glibc. How come this issue was discovered only in production?
20 days ago
I would expect, that dns servers like 1.1.1.1 at this scale have integration tests running real resolvers, like the one in glibc. How come this issue was discovered only in production?
This case would only happen if a CNAME chain first expired from the cache in the wrong order and then subsequently was queried via glibc. Theirs tests may test both that glibc resolving works and that re-querying expired records works, but not the combination of the two.
I’d test such scenarios as well. Run many real glibc resolvers for a while. Sooner or later caching issue would surface.
Agreed. Seems like a pretty risky optimization that fundamentally changed behavior; like it or not the ordering of vectors is often part of the data structure.
Could have just used a prepend to preserve behavior instead pf going down the rabbit hole of re-interpreting the RFC (which is a cop out IMO; it worked before, a change broke it).