Comment by alexwennerberg

2 months ago

This was an excellent explanation of a complex business problem, which would be made far more complex by splitting these out into separate services. Every single 'if' branch you describe could either be a line of code, or a service boundary, which has all the complexity you describe, in addition to the added complexity of:

a. managing an external API+schema for each service

b. managing changes to each service, for example, smooth rollout of a change that impacts behavior across two services

c. error handling on the client side

d. error handling on the server side

e. added latency+compute because a step is crossing a network, being serialized/de-serialized on both ends

f. presuming the services use different databases, performance is now completely shot if you have a new business problem that crosses service boundaries. In practice, this will mean doing a "join" by making some API call to one service and then another API call to another service

In your description of the problem, there is nothing that I would want to split out into a separate service. And to get back to the original problem, it makes it far easier to get all the logging context for a single problem in a single place (attach a request ID to the all logs and see immediately everything that happened as part of that request)

1 comment

alexwennerberg

dxdm 2 months ago

That's a good summary of the immediate drawbacks of putting network calls between different parts of the system. You're also right to point out that I gave no good reason why you might want to to incur this overhead.

So what's the point?

I think the missing ingredient is scale: how much are you doing, and maybe also how quickly you got where you are.

The system does a lot, even once in place, there's enough depth and surface to your business and operational concerns that something is always changing. You're going to need people to build, extend and maintain it. You will have multiple teams specializing in different parts of the system. Your monolith is carved into team territories, which are subdivided into quasi-autinomous regions with well-defined boundaries and interfaces.

Having separate services for different regions buys you flexibility in the chosen implementation language. This makes it easier to hire competent people, especially initially, when you need seasoned domain experts to get things started. It also matters later, where you may find it easier to find people to work on your glue code parts of the system, where you may be more relaxed about language choice.

Being able to deploy and scale parts of your service separately can also be a benefit. As I said, things are busy, people check in a lot of code. Not having to redeploy and reinitialize the whole world every few minutes, just because some minor thing changed somewhere is good. Not bringing everything down when inevitably something breaks it also nice. You need some critical parts to be there; but a lot of your system can be gone for a while no problem. Don't let those expendables take down your critical stuff. (Yes, failure modes shift; but there's a difference between having a priority 1 outage every day, or much less frequently. That difference is also measured in developer health.)

About the databases: some of your data is big enough that you don't want to use joins anyway. They have a way of suddenly killing db performance. Those who absolutely need it are on DynamoDb. Some others are still okay with a big Postgres instances, where the large tables are a little bit denormalized. (BI want to do tons of joins, but they sit on their separate lake of data.) There's a lot of small fry that's locally very connected, and has some passing knowledge of the existence some big, important business object, but crucially not its insides. If you get a new business concern, hopefully you cut your services and data around natural business domains, or you will need to do more engineering now. Just like in your monolith, you don't want any code to be able to join any two tables, because that would mean that things are to messy to reason about the system anymore. Mind your foreign keys! In any case, if you need DynamoDb, you'll be facing similar problems in your monolith.

A nice side effect of separate services is that the resist an intermingling of concerns that must be prevented actively in monoliths. People love reaching into things they shouldn't. But that's a small upside against the many disadvantages.

Another small mitigating factor is that a lot of your services will be IO bound and make network requests anyway to perform their functions, the kind that makes the latency from your internal network hop much less of a trade-off.

It's all a trade-off. Don't spin off a service until you know why, and until you have a pretty good idea where to make a cut that's a good balance of contained complexity vs surface area.

Now, do you really need 15 different services? Probably not. But I could see how they could work together well, each of them taking care of some well-defined part of your business domain. There's enough meat there that I would not call things a mistake without a closer look.

This us by no means the only way to do things. All I wanted is show that it can be a reasonable way. I hope there's more reason now.

As for the logging problem: it's not hard to have a standard way to hand around request ids from your gateway, to be put in structured logs.