November 2, 2018

Testing Producer Deduplication in Apache Kafka and Apache Pulsar

November 2, 2018

Failures can induce message duplication on both the producer and consumer side. In this post we’ll focus solely on producer side duplication, looking at how the deduplication feature works in Apache Pulsar and Apache Kafka. I have run many hours of deduplication tests of both messaging systems and we´ll see the results of those tests.

On the producer side, when a producer sends a message and an error occurs, such as a TCP connection failure, the producer has no way to know if the message was persisted or not. We have two choices, send the message again to ensure it gets delivered and risk duplication, or not send it again and risk the message never getting delivered.

Jack Vanlightly

September 20, 2018

Messaging Systems

How to Lose Messages on a Kafka Cluster - Part 2

Jack Vanlightly

September 20, 2018

Messaging Systems

More failure scenarios! See part 1 for the first seven scenarios.

Jack Vanlightly

September 18, 2018

Messaging Systems

How to Lose Messages on a Kafka Cluster - Part 1

Jack Vanlightly

September 18, 2018

Messaging Systems

In my previous post I used Blockade, Python and some Bash scripts to test a RabbitMQ cluster under various failure conditions such as failed nodes, network partitions, packet loss and a slow network. The aim was to find out how and when a RabbitMQ cluster loses messages. In this post we’ll do exactly the same but with a Kafka cluster. We’ll use our knowledge of the inside workings of Kafka and Zookeeper to produce various failure modes that produce message loss. Please read my post on Kafka fault tolerance as this post assumes you understand the basics of the acknowledgements and replication protocol.

Jack Vanlightly

September 5, 2018

Messaging Systems

RabbitMQ vs Kafka Part 6 - Fault Tolerance and High Availability with Kafka

Jack Vanlightly

September 5, 2018

Messaging Systems

In the last post we took a look at the RabbitMQ clustering feature for fault tolerance and high availability. In this post we'll dig deep into Apache Kafka and its offering.

With Kafka the unit of replication is the partition. Each topic has one or more partitions and each partition has a leader and zero or more followers. When you create a topic you specify the number of partitions and the replication factor. A replication factor of three is common, this equates to one leader and two followers. Both leaders and followers can be referred to as replicas.

Jack Vanlightly

May 21, 2018

Messaging Systems

Event-Driven Architectures - Queue vs Log - A Case Study

Jack Vanlightly

May 21, 2018

Messaging Systems

In the previous post we looked at relative event ordering and the decoupling of publishers and consumers among other things. In this post we'll take those concepts and look at an example architecture. We'll look at the various modelling possibilities we have with RabbitMQ representing a queue based system, and Kafka representing a log based system.

Jack Vanlightly

April 28, 2018

Data

SQL Server CDC to Redshift Pipeline

Jack Vanlightly

April 28, 2018

Data

In this post we'll take a look at what Change Data Capture (CDC) is and how we can use it to get data from SQL Server into Redshift in either a near real-time streaming fashion or more of a batched approach.

CDC is a SQL Server Enterprise feature and so not available to everyone. Also there are vendors that sell automated change data capture extraction and load into Redshift, such as Attunity and that may be your best option. But if you can't or don't want to pay for another tool on top of your SQL Server Enterprise license then this post may help you.

Jack Vanlightly

December 26, 2017

Messaging Systems

RabbitMQ vs Kafka Part 4 - Message Delivery Semantics and Guarantees

Jack Vanlightly

December 26, 2017

Messaging Systems

Both RabbitMQ and Kafka offer durable messaging guarantees. Both offer at-most-once and at-least-once guarantees but kafka offers exactly-once guarantees in a very limited scenario.

Let's first understand what these guarantees mean:

At-most-once delivery. This means that a message will never be delivered more than once but messages might be lost.
At-least-once delivery. This means that we'll never lose a message but a message might end up being delivered to a consumer more than once.
Exactly-once delivery. The holy grail of messaging. All messages will be delivered exactly one time.

Delivery is probably the wrong word for the above terms, instead Processing might be a better way of putting it. After all what we care about is whether a consumer can process a message and whether that is at-most-once, at-least-once or exactly-once. But using the word processing complicates things, exactly-once delivery makes less sense now as perhaps we need it to be delivered twice in order to be able to successfully process it once. If the consumer dies during processing, then we need that the message be delivered a second time for a new consumer.

Jack Vanlightly

December 10, 2017

Messaging Systems

RabbitMQ vs Kafka Part 3 - Kafka Messaging Patterns

Jack Vanlightly

December 10, 2017

Messaging Systems

In Part 2 we covered the patterns and topologies that RabbitMQ enables. In this part we'll look at Kafka and contrast it against RabbitMQ to get some perspective on their differences. Remember that this comparison is within the context of an event-driven application architecture rather than data processing pipelines, although the line between them can be a bit grey. Perhaps it is more like a continuum and this comparison focuses on the event-driven applications end of that continuum.

Jack Vanlightly

December 10, 2017

Messaging Systems

RabbitMQ vs Kafka Part 2 - RabbitMQ Messaging Patterns

Jack Vanlightly

December 10, 2017

Messaging Systems

In this part we're going to forget about the low level details in the protocols and concentrate on the higher level patterns and message topologies that can be achieved in RabbitMQ. In Part 3 of the series we'll do the same for Apache Kafka.

First we'll cover the building blocks, or routing primitives, of RabbitMQ:

Exchange types and bindings
Queues
Dead letter exchanges
Ephemeral exchanges and queues
Alternate Exchanges
Priortity Queues

Then we'll combine them all into a set of example patterns.

Jack Vanlightly

December 10, 2017

Messaging Systems

RabbitMQ vs Kafka Part 1 - Two Different Takes on Messaging

Jack Vanlightly

December 10, 2017

Messaging Systems

In this part we'll explore what RabbitMQ and Apache Kafka are and their approach to messaging. Each technology has made very different decisions regarding every aspect of their design, each with strengths and weaknesses. We'll not come to any strong conclusions in this part, instead think of this as a primer on the technologies so we can dive deeper in subsequent parts of the series