← Back to context

Comment by LgWoodenBadger

1 day ago

Producer doesn’t care how many partitions there are, it doesn’t even know about them, unless it wants to use its own partitioning algorithm. You can change the number of partitions on the topic after the fact.

In this case it would need to use its own partitioning algorithm because of some specific ordering guarantees we care about.

  • Then rewrite them to another topic. Nevermind, complex multithreading sounds like the better solution

    • There’s more to it than that. We don’t care about total order even within partitions. Every so often we get a message that must not be sent downstream until some subset of messages have been sent.

      So most of the time we’re fine sending 100-200 parallel message batches, but sometimes we need to stop and wait for some batches to complete before sending any more.

      We also want to control how hard we hammer specific resources downstream, which don’t correlate with the partitions we’d need. Additionally we want to scale up and scale down the parallelism per each of the previously mentioned resources depending on how fast they are coming in to maximize batch size (while keeping latency low).

      There’s of course ways to do this with multiple partitions by having the consumers communicate with each other. But now we have added an additional consumer and topic to the pipeline, and an inter-consumer control system.

      It was overall easier to have one consumer read from the existing topic and spawn goroutines, so that we can have more dynamic control, the ability to scale up and down immediately without worrying about rebalancing, and easy communication between threads.