← Back to context

Comment by vermon

8 months ago

Interesting, if partitioning is not a useful concept of Kafka, what are some of the better alternatives for controlling consumer concurrency?

It is useful, but it is not generally applicable.

Given an arbitrary causality graph between n messages, it would be ideal if you could consume your messages in topological order. And that you could do so in O(n log n).

No queuing system in the world does arbitrary causality graphs without O(n^2) costs. I dream of the day where this changes.

And because of this, we’ve adapted our message causality topologies to cope with the consuming mechanisms of Kafka et al

To make this less abstract, imagine you have two bank accounts, each with a stream. MoneyOut in Bob’s account should come BEFORE MoneyIn when he transfers to Alice’s account, despite each bank account having different partition keys.

  • Can you elaborate on how you have “adapted…message causality topologies to cope with consuming mechanisms” in relation to the example of a bank account? The causality topology being what here, couldn’t one day MoneyIn should come before else there can be now true MoneyOut?

    • Right on, great question. Some examples:

      Example Option 1

      You give up on the guarantees across partition keys (bank accounts), and you accept that balances will not reflect a causally consistent state of the past.

      E.g., Bob deposits 100, Bob sends 50 to Alice.

      Balances: Bob 0 Alice 50 # the source system was never in this state Bob 100 Alice 50 # the source system was never in this state Bob 50 Alice 50 # eventually consistent final state

      Example Option 2

      You give up on parallelism, and consume in total order (i.e., one single partition / unit of parallelism - e.g., in Kafka set a partitioner that always hashes to the same value).

      Example Option 3

      In the consumer you "wait" whenever you get a message that violates causal order.

      E.g., Bob deposits 100 Bob sends 50 to Alice (Bob-MoneyOut 50 -> Alice-MoneyIn 50).

      If we attempt to consume Alice-MoneyIn before Bob-MoneyOut, we exponentially back off from the partition containing Alice-MoneyIn.

      (Option 3 is terrible because of O(n^2) processing times in the worst case and the possibility for deadlocks (two partitions are waiting for one another))

      4 replies →