In the last post Broker-Visible vs Client-Local Parallelism we looked at two ways of scaling Kafka consumption. The final unit of parallelism can be visible to the broker, as consumers, or it can be local to the client, as threads, virtual threads, async tasks, or some other execution mechanism hidden behind a smaller number of consumers.
Broker-visible parallelism is simple to reason about: if each consumer processes records serially, we add more consumers to increase parallelism. But each consumer adds overhead to the brokers: broker-side protocol state, TCP connections, group membership, fetch state, and participation in the consumer or share group protocol. With long processing times and/or high throughput, the required number of parallel workers can easily exceed what is practical to model as broker-visible consumers.
That is where client-local parallelism becomes important. Instead of scaling by adding more consumers, each consumer application can poll records and process them concurrently inside the client. This allows a smaller number of Kafka consumers to drive a much larger amount of parallel work.
In this post, we’ll compare client-local parallelism with consumer groups and share groups using the Apache Kafka clients, by way of Dimster, the benchmarking tool used throughout this series. Dimster uses the official Apache Kafka clients under the hood. The main comparison is between two styles of client-local parallelism: blocking and continuous styles.