← Back to context

Comment by don-code

4 years ago

This happened to me once at an old job. I discovered a race condition between the process that sanitized data out of the staging environment as it was (continuously) refreshed in from a production fork, and the process that sent notification e-mails during mass events.

Our customers were all rather large corporate types, so a great many "Was this sent to my stores?" type messages from VPs and C-staff ensued.

I luckily wasn't involved in the communications fallout from this, but it did initiate a sweeping change inside of engineering to go proactively find and correct for these sorts of issues. We had the same pattern in use throughout our platform, and really all that stood in the way of this race condition being triggered had been QA not testing at the exact same time a critical part of that refresh process was running.