← Back to context

Comment by rspeele

17 hours ago

Suppose the image resize service has some caching, and due to a bug in the caching, under certain circumstances it will respond with an already-cached resized version of a different source image.

Let's say for example it caches on something stupid like the CRC32 of the input image -- good enough that the couple dozen images in your test dataset don't collide, you don't see it in smoke testing your app, but real world data has collisions on a daily basis.

This gets into production and customer A sees a resized version of customer B's document for a thumbnail. Now customer A is wondering how many other customers are seeing resized versions of their private documents in thumbnail images. They are very very mad.

If the image resize service was built by "another team" then that other team is responsible for the bug and will take most of the heat for it. If it was built by an "agent swarm" or "gas town" or whatever under my direction then I'm 100% responsible for it and rightly deserve the heat.

That is why I cannot understand any approach that doesn't involve reading the code at all. Testing alone is not sufficient. MTTR is not sufficient because you can't make a customer less mad about a data privacy bug by fixing it.

Practically, this is just about confidence values, anticipated blast radius and balancing testing vs review overhead.