Comment by supriyo-biswas

20 days ago

My reading of that statement is their test, assuming they had one, looked something like this:

    rrs = resolver.resolve('www.example.test')
    assert Record("cname1.example.test", type="CNAME") in rrs
    assert Record("192.168.0.1", type="A") in rrs

Which wouldn't have caught the ordering problem.

It's implied that they intentionally tested it that way, without any assertions on the order. Not by oversight of incompetence, but because they didn't want to bake the requirement in due to uncertainty.

  • That would be silly to stick that tightly to a 40 year old standard. They can easily observe the behavior of every other public DNS resolver (they are Cloudflare, so gathering data on such a scale should be easy) and see how they return results.

    Honestly, though, I’d be surprised if they actually even considered it. Everything about the article says to me that the engineer(s) who caused this problem are desperately trying to deflect blame for not having a comprehensive test suite. Sorry, but you don’t go tweaking order of results for such a long-standing, high volume, and crucial protocol just because the 40 year old spec isn’t clear about it.

  • That approach only makes sense if tests are immutable though. If you are unsure if the order matters you should still test for it so you get a reminder to re-check your assumptions when the order changes.