Testing Producer Deduplication in Apache Kafka and Apache Pulsar

Failures can induce message duplication on both the producer and consumer side. In this post we’ll focus solely on producer side duplication, looking at how the deduplication feature works in Apache Pulsar and Apache Kafka. I have run many hours of deduplication tests of both messaging systems and we´ll see the results of those tests.

On the producer side, when a producer sends a message and an error occurs, such as a TCP connection failure, the producer has no way to know if the message was persisted or not. We have two choices, send the message again to ensure it gets delivered and risk duplication, or not send it again and risk the message never getting delivered.

With Apache Kafka and Apache Pulsar that decision has been made a lot easier, enable deduplication on the topic and always resend.

Both Apache Pulsar and Apache Kafka offer idempotent producers where messages can be retried and the broker is responsible for detecting and ignoring duplicate messages. This retry mechanism is hidden from the developer and no explicit retry code is required. Both systems also ensure that message ordering is preserved, even though messages may be received out-of-order due to retries.

We’ll run two types of test: TCP connection failure and broker fail-over. Both of these types of failure will introduce duplicates when retries are enabled and deduplication is disabled. Each test will be run with deduplication disabled (a control test to ensure duplications are being created) and with deduplication enabled (to test that duplicates are not added). Each test will be run 50 times.

How Apache Pulsar Performs Deduplication

Apache Pulsar uses per producer/topic sequence numbers to detect duplicates. Each topic owner broker maintains an in-memory hashmap of the latest sequence number per topic/producer.

If a producer sends messages with sequence numbers of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 but the TCP connection failed before the producer received an ack for message 6, then the producer retransmits messages 6-10. On the broker, it was able to persist messages 6-10 but the TCP connection failed before it could acknowledge them. So the broker receives the sequence 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 6, 7, ,8 ,9, 10. When the broker sees the second 6-10, it ignores them as they are lower or equal to the largest sequence number from that producer that has been successfully persisted.

The broker periodically snapshots the latest sequence number to a cursor which allows the map to be reconstructed by another broker after a fail-over. The new broker reads the cursor and also replays the messages from the last ledger, adding them to the in-memory map. It reads from the ledger as it is likely that the latest snapshot does not represent all the latest state. This mechanism guards against the situation where a producer resends a bunch of unacknowledged messages to a new broker after the original broker died.

A more detailed explanation can be found in the Apache Pulsar GitHub wiki. See how to configure deduplication in the Apache Pulsar docs.

How Apache Kafka Performs Deduplication

Apache Kafka works in a similar way using per producer topic partition sequence numbers. Kafka is able to maintain deduplication across broker fail-overs using a different mechanism to that of Pulsar.

Kafka adds the producer id, sequence number and generation to each message. This means that followers can maintain the same in-memory map of the latest sequence numbers per producer as they are contained in the messages. This means that a newly elected partition leader has all the state required for deduplication. The generation is a fencing token or epoch that prevents zombie producers from writing to a topic.

With Pulsar, the broker and storage layer are separate, so a new topic owner broker needs to get the state from somewhere (cursor and ledger) but Kafka has broker and storage combined and promoted followers already have all the data they need for deduplication.

A more detailed look of the design can be found in the Apache Kafka wiki. To enable deduplication set the producer config “enable.idempotence” to true. The client then sets related config options for you.

The librdkafka (C++) library states:

When set to true, the producer will ensure that messages are successfully produced exactly once and in the original produce order. The following configuration properties are adjusted automatically (if not modified by the user) when idempotence is enabled: max.inflight.requests.per.connection=5 (must be less than or equal to 5), retries=INT32_MAX (must be greater than 0), acks=all, queuing.strategy=fifo. Producer instantation will fail if user-supplied configuration is incompatible.

Note that the librdkafka library support is not out until late november.

The Java client docs state:

To enable idempotence, the enable.idempotence configuration must be set to true. If set, the retries config will default to Integer.MAX_VALUE and the acks config will default to all. There are no API changes for the idempotent producer, so existing applications will not need to be modified to take advantage of this feature.

To take advantage of the idempotent producer, it is imperative to avoid application level re-sends since these cannot be de-duplicated. As such, if an application enables idempotence, it is recommended to leave the retries config unset, as it will be defaulted to Integer.MAX_VALUE.

The Tests

All test code is found in my ChaosTestingCode repo on Github.

Each test consists of a producer sending messages as fast as it can, limiting itself to 10000 in-flight messages at a time. Midway through transmission we either kill the broker (the partition leader with Kafka or the topic owner with Pulsar), or kill the client connections to that broker. The client is configured to retry for a few minutes. Then once transmission is complete we run a consumer and check if all the sent messages are read, detect duplicates, lost messages and ordering issues.

Message Ordering

The content of each message that the producer sends is a monotonically incrementing integer value. If the consumer reads the messages in this same monotonically increasing order then we can say that the messaging system preserved the correct ordering of messages.

The consumer detects four deviations from the correct ordering and logs each as an event:

  • JUMP FORWARD (JF). There is a gap, for example jumping from 10 to 20 instead of 10 to 11, and the value 20 was not previously read.

  • JUMP BACKWARDS (JB). The consumer reads a lower value than the previous read, this value has not been read previously.

  • DUPLICATE JUMP FORWARD (DJF). There is a gap between the last value and the current value, and the current value has been previously read.

  • DUPLICATE JUMP BACKWARDS (DJB). The consumer reads a lower value than the previous read, and this value has already been read previously.

An example is a common pattern seen in my tests with Kafka with deduplication disabled and repeatedly killing the TCP connection is one or more series of the following pattern: JF, JB, JB, DJB, DJB, JF.

1
2
3
4
5
6
7
8
9
10
30 JUMP FORWARDS 20 (10 -> 30)
31
32
33
34
35
36
37
38
39
40
15 JUMP BACKWARDS 25 (40 -> 15)
16
17
18
19
20
21
22
23
24
25
26
27
28
29
11 JUMP BACKWARDS 18 (29 -> 11)
12
13
14
5 DUPLICATE JUMP BACKWARDS 9 (14 -> 5)
6
7
8
9
10
2 DUPLICATE JUMP BACKWARDS 8 (10 -> 2)
3
4
41 JUMP FORWARDS 37 (4 -> 41)
42
43
44

While the tests are running it prints to the terminal the result of each run, for example:

Final ack count: 1000000
Final positive ack count: 1000000
Final negative ack count: 0
Messages received: 1000000
Acked messages missing: 0
Non-acked messages received: 0
Duplicates: 1376
Duplicate Jump Forward: 4
Duplicate Jump Back: 4
Non-Duplicate Jump Forward: 4
Non-Duplicate Jump Back: 0

The results are also written to a file in CSV format with the following fields:

  • DedupEnabled (true/false)

  • TestRun (the test run number)

  • SendCount (the number of messages sent)

  • AckCount (the number of messages acknowledged positively or negatively)

  • PosAckCount (the number of messages positively acknowledged)

  • NegAckCount (the number of messages negatively acknowledged)

  • Received (the number of messages consumed)

  • NotReceived (the number of messages lost)

  • ReceivedNoAck (the number of unacknowledged messages consumed)

  • MsgsWithDups (the number of messages with duplicates)

  • DJF

  • DJB

  • JF

  • JB

Additionally, a log of all jump forwards and backwards events are logged to a file.

Cluster Setup

The clusters consist of:

  • Kafka: 1 ZooKeeper, 3 Kafka brokers

  • Pulsar: 1 ZooKeeper, 1 Pulsar proxy, 3 Pulsar brokers, 3 BookKeeper nodes

The Pulsar tests are run with a Python script called dedup_test.py in the Pulsar/automated directory. The Kafka tests are run with a Python script called dedup_test.py in the Kafka/automated directory.

The network is slowed by 20ms for Kafka brokers and 100ms for Pulsar brokers to help increase the amount of duplicates created. By adding latency to outgoing packets from the brokers we increase the number of pending acks at the time of failure, which increases the number of messages retransmitted. It was harder to make Pulsar duplicate messages with deduplication disabled. Even with 100ms of added network delay, duplication was not always achieved.

TCP Connection Failure Test

In this test we’ll be sending messages as fast as we can and then, mid-transmission, we’ll repeatedly kill all established client TCP connections to the broker for a 10 second period (using tcpkill). Both the Pulsar client and Kafka client are configured to perform retries. We have a control test where we do not have deduplication enabled and a second test with deduplication enabled. At the end we’ll compare the results.

Kill TCP Connection Pulsar Test

The arguments are:

  • Test run name: test-kill-tcp

  • Test runs: 50 (50 with dedup disabled and 50 with it enabled)

  • Messages to send: 1000000

  • Introduce failure at message: 50000

  • Failure type: kill TCP connections

$ python dedup_test.py test-kill-tcp 50 1000000 50000 kill-tcp
...
$ cat test-output/test-kill-tcp_dedup_output.txt
DedupEnabled,TestRun,SendCount,AckCount,PosAckCount,NegAckCount,Received,NotReceived,ReceivedNoAck,MsgsWithDups,DJF,DJB,JF,JB
false,1,1000000,1000000,1000000,0,1009000,0,0,9000,0,1,0,0
false,2,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,3,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,4,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,5,1000000,1000000,1000000,0,1009568,0,0,9568,0,1,0,0
false,6,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,7,1000000,1000000,1000000,0,1009752,0,0,9752,0,1,0,0
false,8,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,9,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,10,1000000,1000000,1000000,0,1002873,0,0,2873,0,1,0,0
false,11,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,12,1000000,1000000,1000000,0,1009280,0,0,9280,0,1,0,0
false,13,1000000,1000000,1000000,0,1009702,0,0,9702,0,1,0,0
false,14,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,15,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,16,1000000,1000000,1000000,0,1008953,0,0,8953,0,1,0,0
false,17,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,18,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,19,1000000,1000000,1000000,0,1008357,0,0,8357,0,1,0,0
false,20,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,21,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,22,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,23,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,24,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,25,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,26,1000000,1000000,1000000,0,1008627,0,0,8627,0,1,0,0
false,27,1000000,1000000,1000000,0,1008000,0,0,8000,0,1,0,0
false,28,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,29,1000000,1000000,1000000,0,1009269,0,0,9269,0,1,0,0
false,30,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,31,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,32,1000000,1000000,1000000,0,1009450,0,0,9450,0,1,0,0
false,33,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,35,1000000,1000000,1000000,0,1008638,0,0,8638,0,1,0,0
false,36,1000000,1000000,1000000,0,1008448,0,0,8448,0,1,0,0
false,37,1000000,1000000,1000000,0,1009000,0,0,9000,0,1,0,0
false,38,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,39,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,40,1000000,1000000,1000000,0,1008597,0,0,8597,0,1,0,0
false,41,1000000,1000000,1000000,0,1003000,0,0,3000,0,1,0,0
false,42,1000000,1000000,1000000,0,1007577,0,0,7577,0,1,0,0
false,43,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,44,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,45,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,46,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,47,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,49,1000000,1000000,1000000,0,1006117,0,0,6117,0,1,0,0
false,50,1000000,1000000,1000000,0,1008249,0,0,8249,0,1,0,0
true,1,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,2,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,3,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,4,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,8,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,9,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,10,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,11,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,12,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,13,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,14,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,15,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,16,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,17,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,18,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,19,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,20,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,21,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,23,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,24,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,27,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,28,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,29,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,30,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,32,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,35,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,37,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,38,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,39,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,42,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,43,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,46,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,49,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0

We see that without deduplication enabled we get large amounts of duplicates, corresponding closely to the 10000 messages in-flight limit that the producer limits itself to. We also see that regarding message ordering, in each run there is a single jump back with a single block of messages retransmitted.

The test-kill-tcp_order_output.txt shows more details on message ordering showing the JF, JB, DJF and DJB events recorded:

Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 8999 (228805 -> 219806)
Test run: 2 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (151567 -> 141568)
Test run: 5 DUPLICATE BLOCK - JUMP BACKWARDS 9567 (207085 -> 197518)
Test run: 6 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (217489 -> 207490)
Test run: 7 DUPLICATE BLOCK - JUMP BACKWARDS 9751 (143315 -> 133564)
Test run: 8 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (78359 -> 68360)
Test run: 9 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (101000 -> 91001)
Test run: 10 DUPLICATE BLOCK - JUMP BACKWARDS 2872 (197106 -> 194234)
Test run: 11 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (81429 -> 71430)
Test run: 12 DUPLICATE BLOCK - JUMP BACKWARDS 9279 (87028 -> 77749)
Test run: 13 DUPLICATE BLOCK - JUMP BACKWARDS 9701 (104074 -> 94373)
Test run: 14 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (107760 -> 97761)
Test run: 15 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (81000 -> 71001)
Test run: 16 DUPLICATE BLOCK - JUMP BACKWARDS 8952 (102506 -> 93554)
Test run: 17 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (94765 -> 84766)
Test run: 18 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (95849 -> 85850)
Test run: 19 DUPLICATE BLOCK - JUMP BACKWARDS 8356 (108283 -> 99927)
Test run: 20 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (91714 -> 81715)
Test run: 21 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (148915 -> 138916)
Test run: 22 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (112040 -> 102041)
Test run: 23 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (85598 -> 75599)
Test run: 24 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (103000 -> 93001)
Test run: 25 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (91277 -> 81278)
Test run: 26 DUPLICATE BLOCK - JUMP BACKWARDS 8626 (204176 -> 195550)
Test run: 27 DUPLICATE BLOCK - JUMP BACKWARDS 7999 (79000 -> 71001)
Test run: 28 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (160167 -> 150168)
Test run: 29 DUPLICATE BLOCK - JUMP BACKWARDS 9268 (102269 -> 93001)
Test run: 30 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (104000 -> 94001)
Test run: 31 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (125000 -> 115001)
Test run: 32 DUPLICATE BLOCK - JUMP BACKWARDS 9449 (124450 -> 115001)
Test run: 33 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (80000 -> 70001)
Test run: 35 DUPLICATE BLOCK - JUMP BACKWARDS 8637 (213984 -> 205347)
Test run: 36 DUPLICATE BLOCK - JUMP BACKWARDS 8447 (92448 -> 84001)
Test run: 37 DUPLICATE BLOCK - JUMP BACKWARDS 8999 (113000 -> 104001)
Test run: 38 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (80000 -> 70001)
Test run: 39 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (93000 -> 83001)
Test run: 40 DUPLICATE BLOCK - JUMP BACKWARDS 8596 (83607 -> 75011)
Test run: 41 DUPLICATE BLOCK - JUMP BACKWARDS 2999 (74000 -> 71001)
Test run: 42 DUPLICATE BLOCK - JUMP BACKWARDS 7576 (95519 -> 87943)
Test run: 43 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (89860 -> 79861)
Test run: 44 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (125907 -> 115908)
Test run: 45 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (134941 -> 124942)
Test run: 46 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (105456 -> 95457)
Test run: 47 DUPLICATE BLOCK - JUMP BACKWARDS 9999 (86397 -> 76398)
Test run: 49 DUPLICATE BLOCK - JUMP BACKWARDS 6116 (120117 -> 114001)
Test run: 50 DUPLICATE BLOCK - JUMP BACKWARDS 8248 (92249 -> 84001)

When we enabled deduplication, we get 0 duplicates with perfect message ordering across all 50 tests.

Kill TCP Connection Kafka Test

I used the latest Java client for the producer because at the time of writing, the librdkafka C++ library which the Python client wraps still does not have idempotent producer support. The consumer client is the latest Python client (0.11.6).

The arguments are:

  • Test run name: test-kill-tcp

  • Test runs: 50 (50 with dedup disabled and 50 with it enabled)

  • Messages to send: 1000000

  • Introduce failure at message: 50000

  • Failure type: kill TCP connections

$ python dedup_test.py test-kill-tcp 50 1000000 50000 kill-tcp
...
$ cat test-output/test-kill-tcp_dedup_output.txt
DedupEnabled,TestRun,SendCount,AckCount,PosAckCount,NegAckCount,Received,NotReceived,ReceivedNoAck,MsgsWithDups,DJF,DJB,JF,JB
false,1,1000000,1000000,1000000,0,1013263,0,0,13263,0,10,9,10
false,2,1000000,1000000,1000000,0,1014168,0,0,14168,0,21,26,25
false,3,1000000,1000000,1000000,0,1013162,0,0,13162,0,16,16,16
false,4,1000000,1000000,1000000,0,1003226,0,0,3226,0,4,4,4
false,5,1000000,1000000,1000000,0,1003251,0,0,3251,0,4,4,4
false,6,1000000,1000000,1000000,0,1003240,0,0,3240,0,4,4,4
false,7,1000000,1000000,1000000,0,1003225,0,0,3225,0,4,4,4
false,8,1000000,1000000,1000000,0,1006619,0,0,6619,0,8,8,8
false,9,1000000,1000000,1000000,0,1003444,0,0,3444,0,4,4,4
false,10,1000000,1000000,1000000,0,1006503,0,0,6503,0,8,8,8
false,11,1000000,1000000,1000000,0,1001638,0,0,1638,0,2,2,2
false,12,1000000,1000000,1000000,0,1001613,0,0,1613,0,2,2,2
false,13,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,14,1000000,1000000,1000000,0,1010284,0,0,10284,0,12,12,12
false,15,1000000,1000000,1000000,0,1007279,0,0,7279,0,9,10,11
false,16,1000000,1000000,1000000,0,1004862,0,0,4862,0,6,6,6
false,17,1000000,1000000,1000000,0,1008272,0,0,8272,0,10,10,10
false,18,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,19,1000000,1000000,1000000,0,1003432,0,0,3432,0,4,4,4
false,20,1000000,1000000,1000000,0,1004863,0,0,4863,0,6,6,6
false,21,1000000,1000000,1000000,0,1003640,0,0,3640,0,4,4,4
false,22,1000000,1000000,1000000,0,1003432,0,0,3432,0,4,4,4
false,23,1000000,1000000,1000000,0,1004862,0,0,4862,0,6,6,6
false,24,1000000,1000000,1000000,0,1001611,0,0,1611,0,2,2,2
false,25,1000000,1000000,1000000,0,1004889,0,0,4889,0,6,6,6
false,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,27,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,28,1000000,1000000,1000000,0,1003250,0,0,3250,0,4,4,4
false,29,1000000,1000000,1000000,0,1014963,0,0,14963,0,18,18,18
false,30,1000000,1000000,1000000,0,1010063,0,0,10063,0,12,12,12
false,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,32,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,33,1000000,1000000,1000000,0,1003433,0,0,3433,0,4,4,4
false,34,1000000,1000000,1000000,0,1001613,0,0,1613,0,2,2,2
false,35,1000000,1000000,1000000,0,1001820,0,0,1820,0,2,2,2
false,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,37,1000000,1000000,1000000,0,1001780,0,0,1780,0,2,2,2
false,38,1000000,1000000,1000000,0,1006810,0,0,6810,0,8,8,8
false,39,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,42,1000000,1000000,1000000,0,1008090,0,0,8090,0,10,10,10
false,43,1000000,1000000,1000000,0,1006644,0,0,6644,0,8,8,8
false,44,1000000,1000000,1000000,0,1001780,0,0,1780,0,2,2,2
false,45,1000000,1000000,1000000,0,1001612,0,0,1612,0,2,2,2
false,46,1000000,1000000,1000000,0,1001613,0,0,1613,0,2,2,2
false,47,1000000,1000000,1000000,0,1013236,0,0,13236,0,16,16,16
false,48,1000000,1000000,1000000,0,1011456,0,0,11456,0,14,14,14
false,49,1000000,1000000,1000000,0,1013664,0,0,13664,0,16,16,16
false,50,1000000,1000000,1000000,0,1008231,0,0,8231,0,10,10,10
true,1,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,2,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,3,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,4,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,8,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,9,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,10,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,11,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,12,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,13,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,14,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,15,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,16,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,17,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,18,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,19,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,20,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,21,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,23,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,24,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,27,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,28,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,29,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,30,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,32,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,35,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,37,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,38,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,39,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,42,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,43,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,46,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,49,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0

As you can see, with enable.idempotence set to false we got hundreds of thousands of duplicates and multiple message ordering issues.

The test-kill-tcp_order_output.txt file shows more details on the ordering issues with dedup disabled. For example, the first run had a large amount of jumping back and forth in the sequence:

Test run: 1 JUMP BACKWARDS 1781 (87108 -> 85327)
Test run: 1 JUMP BACKWARDS 1781 (86236 -> 84455)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1781 (85326 -> 83545)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1781 (84454 -> 82673)
Test run: 1 JUMP FORWARDS 3565 (83544 -> 87109)
Test run: 1 JUMP FORWARDS 1619 (102087 -> 103706)
Test run: 1 JUMP BACKWARDS 1625 (104512 -> 102887)
Test run: 1 JUMP BACKWARDS 1617 (103705 -> 102088)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1617 (102886 -> 101269)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1609 (102087 -> 100478)
Test run: 1 JUMP FORWARDS 3245 (101268 -> 104513)
Test run: 1 JUMP FORWARDS 1614 (131979 -> 133593)
Test run: 1 JUMP BACKWARDS 1612 (134386 -> 132774)
Test run: 1 JUMP BACKWARDS 1612 (133592 -> 131980)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1612 (132773 -> 131161)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1612 (131979 -> 130367)
Test run: 1 JUMP FORWARDS 3227 (131160 -> 134387)
Test run: 1 JUMP FORWARDS 1639 (164249 -> 165888)
Test run: 1 JUMP BACKWARDS 1612 (166681 -> 165069)
Test run: 1 JUMP BACKWARDS 1637 (165887 -> 164250)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1613 (165068 -> 163455)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1613 (164249 -> 162636)
Test run: 1 JUMP FORWARDS 3228 (163454 -> 166682)
Test run: 1 JUMP FORWARDS 1614 (188496 -> 190110)
Test run: 1 JUMP BACKWARDS 1612 (190903 -> 189291)
Test run: 1 JUMP BACKWARDS 1612 (190109 -> 188497)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1612 (189290 -> 187678)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1612 (188496 -> 186884)
Test run: 1 JUMP FORWARDS 3227 (187677 -> 190904)

With enable.idempotence set to true all the duplicates and message ordering issues disappeared with 0 duplicates and perfect message ordering across all 50 tests.

Broker Fail-Over Tests

Broker Fail-Over Apache Pulsar Test

The arguments are:

  • Test run name: dedup_test_1

  • Test runs: 50 (50 with dedup disabled and 50 with it enabled)

  • Messages to send: 1000000

  • Introduce failure at message: 50000

  • Failure type: kill topic owner

$ python dedup_test.py dedup_test_1 50 1000000 50000 kill-topic-owner
...
$ cat dedup_test_1_order_output.txt
DedupEnabled,TestRun,SendCount,AckCount,PosAckCount,NegAckCount,Received,NotReceived,ReceivedNoAck,MsgsWithDups,DJF,DJB,JF,JB
false,1,1000000,1000000,1000000,0,1003571,0,0,3571,0,1,0,0
false,2,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,3,1000000,1000000,1000000,0,1000917,0,0,917,0,1,0,0
false,4,1000000,1000000,1000000,0,1001802,0,0,1802,0,1,0,0
false,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,8,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,9,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,10,1000000,1000000,1000000,0,1001000,0,0,1000,0,1,0,0
false,11,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,12,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,13,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,14,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,15,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,16,1000000,1000000,1000000,0,1002000,0,0,2000,0,1,0,0
false,17,1000000,1000000,1000000,0,1000929,0,0,929,0,1,0,0
false,18,1000000,1000000,1000000,0,1010000,0,0,10000,0,1,0,0
false,19,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,20,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,21,1000000,1000000,1000000,0,1001000,0,0,1000,0,1,0,0
false,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,23,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,24,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,27,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,28,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,29,1000000,1000000,1000000,0,1001143,0,0,1143,0,1,0,0
false,30,1000000,1000000,1000000,0,1000828,0,0,828,0,1,0,0
false,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,32,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,35,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,37,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,38,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,39,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,42,1000000,1000000,1000000,0,1001000,0,0,1000,0,1,0,0
false,43,1000000,1000000,1000000,0,1000430,0,0,430,0,1,0,0
false,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,46,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
false,49,1000000,1000000,1000000,0,1000990,0,0,990,0,1,0,0
false,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,1,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,2,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,3,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,4,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,8,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,9,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,10,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,11,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,12,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,13,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,14,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,15,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,16,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,17,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,18,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,19,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,20,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,21,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,23,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,24,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,27,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,28,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,29,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,30,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,32,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,35,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,37,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,38,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,39,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,42,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,43,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,46,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,49,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0

We see that we produced multiple duplications when deduplication was disabled but that duplication didn’t always occur. Also just like the TCP kill test we saw no JUMP FORWARD or JUMP BACKWARDS events, only a single DUPLICATE JUMP BACKWARD event per run. Message ordering was not so messed up with jumps forwards and backwards. Apart from one jump back, message ordering was correct.

With deduplication enabled we saw 0 duplication and perfect message ordering.

Broker Fail-Over Apache Kafka Test

The arguments are:

  • Test run name: test-kill-leader

  • Test runs: 50 (50 with dedup disabled and 50 with it enabled)

  • Messages to send: 1000000

  • Introduce failure at message: 50000

  • Failure type: kill partition leader

$ python dedup_test.py test-kill-leader 50 1000000 50000 kill-leader
...
$ cat test-output/test-kill-leader_order_output.txt
Dedup,TestRun,SendCount,AckCount,PosAckCount,NegAckCount,Received,NotReceived,ReceivedNoAck,MsgsWithDups,DJF,DJB,JF,JB
false,1,1000000,1000000,1000000,0,1000819,0,0,819,0,1,1,3
false,2,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,3,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,4,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,8,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,9,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,10,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,11,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,12,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,13,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,14,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,15,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,16,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,17,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,18,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,19,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,20,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,21,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,23,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,24,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,27,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,28,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,29,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,30,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,32,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,34,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,35,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,36,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,37,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,38,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,39,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,41,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,42,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,43,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,46,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
false,48,1000000,1000000,1000000,0,1000791,0,0,791,0,1,2,3
false,49,1000000,1000000,1000000,0,1000819,0,0,819,0,1,2,3
false,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,2,4
true,1,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,2,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,3,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,4,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,5,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,6,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,7,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,8,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,9,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,10,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,11,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,12,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,13,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,14,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,15,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,16,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,17,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,18,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,19,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,20,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,21,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,22,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,23,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,24,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,25,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,26,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,27,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,28,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,29,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,30,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,31,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,32,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,33,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,34,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,35,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,36,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,37,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,38,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,39,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,40,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,41,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,42,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,43,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,44,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,45,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,46,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,47,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,48,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,49,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0
true,50,1000000,1000000,1000000,0,1000000,0,0,0,0,0,0,0

We see many thousands of duplicates and message ordering jumping forwards and backwards with deduplication disabled.

An excerpt from the test-kill-leader_order_output.txt file for the first two runs:

Test run: 1 JUMP BACKWARDS 1637 (108110 -> 106473)
Test run: 1 JUMP BACKWARDS 1637 (107291 -> 105654)
Test run: 1 JUMP BACKWARDS 1629 (106472 -> 104843)
Test run: 1 DUPLICATE BLOCK - JUMP BACKWARDS 1629 (105653 -> 104024)
Test run: 1 JUMP FORWARDS 3269 (104842 -> 108111)
Test run: 2 JUMP FORWARDS 2438 (101446 -> 103884)
Test run: 2 JUMP BACKWARDS 1637 (104702 -> 103065)
Test run: 2 JUMP BACKWARDS 1617 (103883 -> 102266)
Test run: 2 JUMP BACKWARDS 1617 (103064 -> 101447)
Test run: 2 DUPLICATE BLOCK - JUMP BACKWARDS 1609 (102265 -> 100656)
Test run: 2 JUMP FORWARDS 3257 (101446 -> 104703)

With deduplication enabled we saw 0 messages duplicated with perfect message ordering.

Client Failures

Something we didn’t cover is that Apache Pulsar allows you to preserve deduplication across some client failure scenarios where Apache Kafka does not. An example situation is that a client is reading data row by row from a database and sending them via Kafka or Pulsar. The client updates each row in the table once it has received an ack for corresponding message. If the client dies after having sent the message but before updating the row, then the next client will resend that row again and we’ll have a duplicate.

Apache Pulsar lets the client set the sequence numbers, they could correspond to the row ids. Also the client can request the last known sequence number for a given producer id. This allows a new client, using the same producer id, to take over and carry on where the last one left off, starting from the row id last persisted by Pulsar. Check this slide deck from Streamlio.

With Kafka if a client dies and another continues, then it will simply resend that row and end up with a duplicate.

This is a somewhat limited scenario, it wouldn’t cover other more common scenarios such as a REST API sending messages on post requests for example. If the service failed, then the API client might retry and it would hit another service instance and we’d end up with a duplicate all the same.

Transactions

Apache Kafka also has a related feature called transactions that allow atomic read-process-write actions where both read and write are against Kafka. This also prevents duplicates as if any part of the read-process-write fails then no write occurs. This may also be possible with Pulsar leveraging the functionality of clients setting their own sequence numbers using the message ids of read messages. I may have a closer look at that in the future.

Conclusions

The idempotent producer feature of both Apache Kafka and Apache Pulsar look solid. They work in the face of TCP connection failures and broker fail-overs and in my hours of tests did not allow even a single duplicate message to get added to a topic nor did they ever deliver a message out-of-order to the consumer.

With Apache Kafka there are some limitations regarding your producer config. You must use acks=all and you must limit the in-flight requests per connection to 5 or less which could impact throughput if you are currently using acks=1 and a larger number of in-flight requests. But acks=all is necessary if you care about durability in any case. Also, I am not sure that limiting yourself to 5 in-flight requests would impact throughput too much. My tests were able to send 30000 messages a second with a single producer with 5 in-flight requests at a time.

Apache Pulsar does not force you to reduce throughput related producer configuration and therefore enabling an idempotent producer should in theory have a smaller impact. On the broker level, all that is happening is set and lookup operations on hashmaps which should incur little overhead.

The question now is, why would you not use deduplication with Apache Kafka and Apache Pulsar? The only scenario I can think of is if you have many millions of producers that could cause the in-memory hashmaps to grow too large. Apart from that, I would recommend always enabling this feature. You get both greater reliability and consistency.