← Back to context

Comment by sunrunner

4 hours ago

> We paged for every single 500.

Assuming the existence of some kind of network (with zero guarantee of 100% reliability), how does this work in practice? Is each 500 treated as an event that needs investigation, even if the result of that would end up as 'a router dropped something from an internal buffer but the transaction as a whole was re-tried by a parent so the service itself recovered'?

Client network timeout shouldn't result in 500. With 408 and retry you should, dependent on the business criteria, get either an upsert (transaction is retried) or 422 (validation that given entry already exists).

Even if it's "DB in datacenter I tried to save to was hit by meteor" event, you can cater for this not to result in 500 (ie - DB unreachable, retry in a couple of minutes); the question is if you want to.