Comment by dxdm

2 months ago

> If a user request is hitting that many things, in my view, that is a deeply broken architecture.

Things can add up quickly. I wouldn't be surprised if some requests touch a lot of bases.

Here's an example: a user wants to start renting a bike from your public bike sharing service, using the app on their phone.

This could be an app developed by the bike sharing company itself, or a 3rd party app that bundles mobility options like ride sharing and public transport tickets in one place.

You need to authentice the request and figure out which customer account is making the request. Is the account allowed to start a ride? They might be blocked. They might need to confirm the rules first. Is this ride part of a group ride, and is the customer allowed to start multiple rides at once? Let's also get a small deposit by putting a hold of a small sum on their credit card. Or are they a reliable customer? Then let's not bother them. Or is there a fraud risk? And do we need to trigger special code paths to work around known problems for payment authorization for cards issued by this bank?

Everything good so far? Then let's start the ride.

First, let's lock in the necessary data. Which rental pricing did the customer agree to? Is that actually available to this customer, this geographical zone, for this bike, at this time, or do we need to abort with an error? Otherwise, let's remember this, so we can calculate the correct rental fee at the end.

We normally charge an unlock fee in addition to the per-minute price. Are we doing that in this case? If yes, does the customer have any free unlock credit that we need to consume or reserve now, so that the app can correctly show unlock costs if the user wants to start another group ride before this one ends?

Ok, let's unlock the bike and turn on the electric motor. We need to make sure it's ready to be used and talk to the IoT box on the bike, taking into account the kind of bike, kind of box and software version. Maybe this is a multistep process, because the particular lock needs manual action by the customer. The IoT box might have to know that we're in a zone where we throttle the max speed more than usual.

Now let's inform some downstream data aggregators that a ride started successfully. BI (business intelligence) will want to know, and the city might also require us to report this to them. The customer was referred by a friend, and this is their first ride, so now the friend gets his referral bonus in the form of app credit.

Did we change an unrefundable unlock fee? We might want to invoice that already (for whatever reason; otherwise this will happen after the ride). Let's record the revenue, create the invoice data and the PDF, email it, and report this to the country's tax agency, because that's required in the country this ride is starting in.

Or did things go wrong? Is the vehicle broken? Gotta mark it for service to swing by, and let's undo any payment holds. Or did the deposit fail, because the credit card is marked as stolen? Maybe block the customer and see if we have other recent payments using the same card fingerprint that we might want to proactively refund.

That's just off the top of my head, there may be more for a real life case. Some of these may happen synchronously, others may hit a queue or event bus. The point is, they are all tied to a single request.

So, depending on how you cut things, you might need several services that you can deploy and develop independently.

- auth - core customer management, permissions, ToS agreement,

- pricing, - geo zone definitions, - zone rules,

- benefit programs,

- payments and payment provider integration, - app credits, - fraud handling,

- ride management, - vehicle management, - IoT integration,

- invoicing, - emails, - BI integration, - city hall integration, - tax authority integration,

- and an API gateway that fronts the app request.

These do not have to be separate services, but they are separate enough to warrant it. They wouldn't be exactly micro either.

Not every product will be this complicated, but it's also not that out there, I think.

6 comments

dxdm

alexwennerberg 2 months ago

This was an excellent explanation of a complex business problem, which would be made far more complex by splitting these out into separate services. Every single 'if' branch you describe could either be a line of code, or a service boundary, which has all the complexity you describe, in addition to the added complexity of:

a. managing an external API+schema for each service

b. managing changes to each service, for example, smooth rollout of a change that impacts behavior across two services

c. error handling on the client side

d. error handling on the server side

e. added latency+compute because a step is crossing a network, being serialized/de-serialized on both ends

f. presuming the services use different databases, performance is now completely shot if you have a new business problem that crosses service boundaries. In practice, this will mean doing a "join" by making some API call to one service and then another API call to another service

In your description of the problem, there is nothing that I would want to split out into a separate service. And to get back to the original problem, it makes it far easier to get all the logging context for a single problem in a single place (attach a request ID to the all logs and see immediately everything that happened as part of that request)

dxdm 2 months ago

That's a good summary of the immediate drawbacks of putting network calls between different parts of the system. You're also right to point out that I gave no good reason why you might want to to incur this overhead.
So what's the point?
I think the missing ingredient is scale: how much are you doing, and maybe also how quickly you got where you are.
The system does a lot, even once in place, there's enough depth and surface to your business and operational concerns that something is always changing. You're going to need people to build, extend and maintain it. You will have multiple teams specializing in different parts of the system. Your monolith is carved into team territories, which are subdivided into quasi-autinomous regions with well-defined boundaries and interfaces.
Having separate services for different regions buys you flexibility in the chosen implementation language. This makes it easier to hire competent people, especially initially, when you need seasoned domain experts to get things started. It also matters later, where you may find it easier to find people to work on your glue code parts of the system, where you may be more relaxed about language choice.
Being able to deploy and scale parts of your service separately can also be a benefit. As I said, things are busy, people check in a lot of code. Not having to redeploy and reinitialize the whole world every few minutes, just because some minor thing changed somewhere is good. Not bringing everything down when inevitably something breaks it also nice. You need some critical parts to be there; but a lot of your system can be gone for a while no problem. Don't let those expendables take down your critical stuff. (Yes, failure modes shift; but there's a difference between having a priority 1 outage every day, or much less frequently. That difference is also measured in developer health.)
About the databases: some of your data is big enough that you don't want to use joins anyway. They have a way of suddenly killing db performance. Those who absolutely need it are on DynamoDb. Some others are still okay with a big Postgres instances, where the large tables are a little bit denormalized. (BI want to do tons of joins, but they sit on their separate lake of data.) There's a lot of small fry that's locally very connected, and has some passing knowledge of the existence some big, important business object, but crucially not its insides. If you get a new business concern, hopefully you cut your services and data around natural business domains, or you will need to do more engineering now. Just like in your monolith, you don't want any code to be able to join any two tables, because that would mean that things are to messy to reason about the system anymore. Mind your foreign keys! In any case, if you need DynamoDb, you'll be facing similar problems in your monolith.
A nice side effect of separate services is that the resist an intermingling of concerns that must be prevented actively in monoliths. People love reaching into things they shouldn't. But that's a small upside against the many disadvantages.
Another small mitigating factor is that a lot of your services will be IO bound and make network requests anyway to perform their functions, the kind that makes the latency from your internal network hop much less of a trade-off.
It's all a trade-off. Don't spin off a service until you know why, and until you have a pretty good idea where to make a cut that's a good balance of contained complexity vs surface area.
Now, do you really need 15 different services? Probably not. But I could see how they could work together well, each of them taking care of some well-defined part of your business domain. There's enough meat there that I would not call things a mistake without a closer look.
This us by no means the only way to do things. All I wanted is show that it can be a reasonable way. I hope there's more reason now.
As for the logging problem: it's not hard to have a standard way to hand around request ids from your gateway, to be put in structured logs.

0x3f 2 months ago

> These do not have to be separate services, but they are separate enough to warrant it.

All of this arises from your failure to question this basic assumption though, doesn't it?

dxdm 2 months ago
> All of this arises from your failure to question this basic assumption though, doesn't it?
Haha, no. "All of this" is a scenario I consider quite realistic in terms of what needs to happen. The question is, how should you split this up, if at all?
Mind that these concerns will be involved in other ways with other requests, serving customers and internal users. There are enough different concerns at different levels of abstraction that you might need different domain experts to develop and maintain them, maybe using different programming languages, depending on who you can get. There will definitely be multiple teams. It may be beneficial to deploy and scale some functions independently; they have different load and availability requirements.
Of course you can slice things differently. Which assumptions have you questioned recently? I think you've been given some material. No need to be rude.
- 0x3f 2 months ago
  
  I don't think I was rude. You're overcomplicating the architecture here for no good reason. It might be common to do so, but that doesn't make it good practice. And ultimately I think it's your job as a professional to question it, which makes not doing so a form of 'failure'. Sorry if that seems harsh; I'm sharing what I believe to be genuine and valuable wisdom.
  Happy to discuss why you think this is all necessary. Open to questioning assumptions of my own too, if you have specifics.
  As it is, you're just quoting microservices dogma. Your auth service doesn't need a different programming language from your invoicing system. Nor does it need to be scaled independently. Why would it?
  
  1 reply →