← Back to context

Comment by bloaf

5 days ago

> The API failed silently because the database connection pool was exhausted downstream.

I work with a team that does stuff like this, returning a 200 and a body containing "error: I didn't do what you said because _insert error here_"

The problem is that you returned OK instead of ERROR when things were not OK and there was an ERROR.

Its a design that smells of teams trying to hit some kind of internal metrics by slightly deceptive means.

I had to explain so many times to infrastructure guys why it was not okay that the software they use to manage outages still returns 200s.

>returning a 200 and a body containing "error: I didn't do what you said because _insert error here_"

I've seen this approach before, it mostly follows from using the code to signal application errors (200 + ok/error) from other kinds of errors that might arise.

  • HTTP error codes are divided between server (5xx) and client (4xx).

    Where do these "application errors" occur if neither on a server nor a client?

    I think the reality is that management sees "5xx means server error, so our team's KPI is now server error rate, the lower the better!" Then the team just stops using 500 errors as much as possible. They probably justify it with things like "well, such and such problem isn't our fault so its not really a server error." This kind of thinking is perverting the intent of 5xx messages. They are supposed to indicate any failure to handle the request that happens on the server, NOT measure whether the dev team is making a good application.

    • It can happen out of necessity: if the failure is in an ajax request and you need to send back a message or additional data in json, apache eats the body of error responses. So a success response is all that's guaranteed to get through.

      I don't know about others, I know about this one because I had to dig into a bug where something on live looked like it succeeded but didn't, while the error worked fine on dev. Ended up downloading the apache source and finding where it was happening before just using a 200 response.