← Back to context

Comment by prhn

3 days ago

This is surprisingly basic knowledge for ending up on the front page.

It’s a good intro, but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.

There are also different types of queues/exchanges and this is critical depending on the types of consumer or consumers you have. Should I use direct, fan out, etc?

The next interesting question is when should I use a stream instead of a queue, which RabbitMQ also supports.

My advice, having just migrated a set of message queues and streams from AWS(AvtiveMQ) to RabbitMQ is think long and hard before you add one. They become a black box of sorts and are way harder to debug than simple HTTP requests.

Also, as others have pointed out, there are other important use cases for queues which come way before microservice comms. Async processing to free up servers is one. I’m surprised none of these were mentioned.

> This is surprisingly basic knowledge for ending up on the front page.

Nothing wrong with that! Hacker News has a large audience of all skill levels. Well written explainers are always good to share, even for basic concepts.

  • In principle, I agree, but “a message queue is… a medium through which data flows from a source system to a destination system” feels like a truism.

    • For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.

      In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache

      1 reply →

  • Agree! In fact, I would appreciate more well written articles explaining basic concepts on the front page of Hacker News. It is always good to revisit some basic concepts, but it is even better to relearn them. I am surprised by how often I realize that my definition of a concept is wrong or just superficial.

> when to know it’s time to replace my synchronous inter service http requests with a queue

I've found that once it's inconveniently long for a synchronous client side request, it's less about the performance or metrics and more about reasoning. Some things are queue shaped, or async job shaped. The worker -> main app communication pattern can even remain sync http calls or not (like callback based or something), but if you have something that has high variance in timing or is a background thing then just kick it off to workers.

I'd also say start simple and only go to Kafka or some other high dev-time overhead solution when you start seeing Redis/Rabbit stop being sufficient. Odds are you can make the simple solution work.

I think the article would be a little bit more useful to non-beginners if it included an update on the modern landscape of MQs. Are people still using apache kafka lol?

it is a fine enough article as it is though!

  • Kafka is a distributed log system. Yes, people use Kafka as a message queue, but it's often a wrong tool for the job, it wasn't designed for that.

> but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.

Not OP but I have some background on this.

An Erlang loss system is like a set of phone lines. Imagine a special call center where you have N operators, each of which takes calls, talks for some time (serving the customer) and hungs up. Unlike many call centers, however, they don’t keep you in line. Therefore, if all operators are busy the system hungs up and you have to explicitly call again. This is somewhat similar to a server with N threads.

Let's assume N=3.

Under common mathematical assumptions (constant arrival rate, time between arrivals modeled by a Poisson distribution, exponential service time) you can define:

1) “traffic intensity” (rho) has the ratio between arrival time and service time (intuitively, how “heavy” arrivals are with respect to “departures”)

2) the blocking probability is given by the Erlang B formula (sorry, not easy to write here) for parameters N (number of threads) and rho (traffic intensity). Basically, if traffic intensity = 1 (arrival rate = service rate), the blocking probability is 6.25%. If service rate is twice the arrival rate, this drops to 1% approximately. If service rate is 1/10 of the arrival rate, the blocking probability is 73.3%.

I will try to write down part 2 when I find some time.

EDIT - Adding part 2

So, let's add a buffer. We said we have three threads, right? Let's say the system can handle up to 6 requests before dropping, 1 processed by each thread plus an additional 3 buffered requests. Under the same distribution assumptions, this is known as a M/M/3/6 queue.

Some math crunching under the previous service and arrival rate scenarios:

- if service = arrival time, blocking probability drops to 2%. Of course there is now a non-zero wait probability (close to 9%).

- if service = twice the arrival time, blocking probability is 0.006% and there is a 1% wait probability.

- if service = 1/10 of the arrival time, blocking probability is 70%, waiting probability is 29%.

This means that a buffer reduces request drops due to busy resources, but also introduces a waiting probability. Pretty obvious. Another obvious thing is that you need additional memory for that queue length. Assuming queue length = 3, and 1 KB messages, you need 3 KB of additional memory.

A less obvious thing is that you are adding a new component. Assuming "in series" behavior, i.e. requests cannot be processed when the buffer system is down, this decreases overall availability if the queue is not properly sized. What I mean is that, if the system crashes when more than 4 KB of memory are used by the process, but you allow queue sizes up to 3 (3 KB + 3 KB = 6 KB), availability is not 100%, because in some cases the system accepts more requests than it can actually handle.

An even less obvious thing is that things, in terms of availability, change if you consider server and buffer as having distinct "size" (memory) thresholds. Things get even more complicated if server and buffer are connected by a link which itself doesn't have 100% availability, because you also have to take into account the link unavailability.