Comment by Joker_vD

2 days ago

> - The team constantly bikesheds over correct status codes and at least a few are used contrary to the HTTP spec

I really wish people just used 200 status code and put encoded errors in the payloads themselves instead of trying to fuse the transport layer's (which HTTP serves as, in this case) concerns with the application's concerns. Seriously, HTTP does not mandate that e.g. "HTTP/1.1 503 Ooops\r\n\r\n" should be stuffed into the TCP's RST packet, or into whatever TLS uses to signal severe errors, for bloody obvious reasons: it doesn't belong there.

Like, when you get a 403/404 error, it's very bloody difficult to tell apart the "the reverse proxy before the server is misconfigured and somebody forgot to expose the endpoint" and "the server executed your request to look up an item perfectly fine: the DB is functional, and the item you asked for is not in there" scenarios. And yeah, of course I could (and should) look at and try to parse the response's body but why? This "let's split off the 'error code' part of the message from the message and stuff it somewhere into the metadata, that'll be fine, those never get messed up or used for anything else, so no chance of confusion" approach just complicates things for everyone for no benefit whatsoever.

The point of status codes is to have a standard that any client can understand. If you have a load balancer, the load balancer can unhealthy backends based on the status code. Similarly if you have some job scheduler or workflow engine that's calling your API, they can execute an appropriate retry strategy based on the status code. The client in most cases does not care about why something failed, only whether it has failed. Being able to tell apart if the failure was due to reverse proxy or database or whatever is the server's concern and the server can always do that with its own custom error codes.

  • > The client in most cases does not care about why something failed, only whether it has failed.

    "...and therefore using different status codes in the responses is mostly pointless. Therefore, use 200 and put "s":"error" in the response".

    > Being able to tell apart if the failure was due to reverse proxy or database or whatever is the server's concern.

    One of the very common failures is for the request to simply never reach "the server". In my experience, one of the very first steps in improving the error handling quality (on the client's side) is to start distinguishing between the low-level errors of "the user has literally no connection Internet" and "the user has connected somewhere, but that thing didn't really speak the server protocol", and the high-level errors "the client has talked with the application server (using the custom application protocol and everything), and there was an error on the application server's side". Using HTTP-status codes for both low- and high-level errors makes such distinctions harder to figure out.

    • I did say most cases, not all cases. There are some concerns that are considered cross cutting, to have made it into the standard. For instance, many clients will handle a 401 by redirecting to an auth flow, or handle a 429 rate limited by backing off before making a request, handle 426 by upgrading the protocol etc. Not all statuses may be relevant for a given system, you can club several scenarios under a 400 or a 500 and that's perfectly fine for many use cases. But when you have cross cutting concerns, it's beneficial to follow fine grained status codes. It gives you a lot of flexibility in how you can connect different parts of your architecture and reduces integration headaches.

      I think a more precise term for what you're describing is transport errors vs business errors. You're right that you don't want to model all your business errors as HTTP status codes. Your business scenarios are most certainly numerous and need to be much more fine grained than what the standard offers. But the important thing is all errors business or transport eventually need to map to a HTTP status code because that's the protocol you're ultimately speaking.

      1 reply →

  • what is a unhealthy request? is searching for a user which was _not found_ by the server unhealthy? was the request successful? thats where different opinions exist.

    • Sure, there's some nuance to it that depends on your application, but it's the server's responsibility to do so, not the client's. The status code exists for this reason and the standard also classifies status codes under client error and server error so that clients can determine whether a server is unhealthy simply by looking at the status code.

Eh, if you're doing RPC where the whole request/response are already in another layer on top of HTTP, then sure, 200 everything.

But to me, "REST" means "use the HTTP verbs to talk about resources". The whole point is that for resource-oriented APIs, you don't need another layer. In which case serving 404s for things that don't exist, or 409s when you try to put things into a weird state makes perfect sense.