July 25, 2018

RabbitMQ Work Queues: Avoiding Data Inconsistency with Rebalanser

July 25, 2018

With RabbitMQ we can scale-out our consumers by simply adding more, but we can also scale-out our queues. There are a few reasons why scaling out our queues might be preferential to simply adding more consumers to a single queue (competing consumers), one of those reasons is when using the work queue pattern.

Jack Vanlightly

July 23, 2018

Messaging Systems

Creating Consumer Groups in RabbitMQ with Rebalanser - Part 1

Jack Vanlightly

July 23, 2018

Messaging Systems

This is the first post in a series that will look at bringing Kafka features to RabbitMQ. In this post we'll look at how we can partition a RabbitMQ queue into multiple queues, perform automatic queue assignment to consumers and perform automatic rebalancing as the number of queues and consumers change.

Jack Vanlightly

May 26, 2018

Programming

Docker, .NET Core and Redshift Drivers

Jack Vanlightly

May 26, 2018

Programming

Just a quick post to share how to include the Redshift ODBC driver in your .NET Core docker container. This is from the CdcTools.CdcToRedshift application in my CDC Tools repo.

Jack Vanlightly

May 21, 2018

Messaging Systems

Event-Driven Architectures - Queue vs Log - A Case Study

Jack Vanlightly

May 21, 2018

Messaging Systems

In the previous post we looked at relative event ordering and the decoupling of publishers and consumers among other things. In this post we'll take those concepts and look at an example architecture. We'll look at the various modelling possibilities we have with RabbitMQ representing a queue based system, and Kafka representing a log based system.

Jack Vanlightly

May 20, 2018

Messaging Systems

Event-Driven Architectures - The Queue vs The Log

Jack Vanlightly

May 20, 2018

Messaging Systems

A messaging system is at the heart of most event-driven architectures and there are a plethora of different technologies in the space and they can be classified as either queue based or log based.

Queue based: RabbitMQ, ActiveMQ, MSMQ, AWS SQS, JMQ and many more.

Log based: Apache Kakfa, Apache Pulsar, AWS Kinesis, Azure Event Hubs and many more.

Each messaging system has different features but at the heart are their data structures: queue or log. In this post we'll take a look at how the underlying data structure affects your event-driven architecture.

Jack Vanlightly

April 28, 2018

Data

SQL Server CDC to Redshift Pipeline

Jack Vanlightly

April 28, 2018

Data

In this post we'll take a look at what Change Data Capture (CDC) is and how we can use it to get data from SQL Server into Redshift in either a near real-time streaming fashion or more of a batched approach.

CDC is a SQL Server Enterprise feature and so not available to everyone. Also there are vendors that sell automated change data capture extraction and load into Redshift, such as Attunity and that may be your best option. But if you can't or don't want to pay for another tool on top of your SQL Server Enterprise license then this post may help you.

Jack Vanlightly

April 21, 2018

Programming

Processing Pipelines Series - Reactive Extensions (Rx.NET)

Jack Vanlightly

April 21, 2018

Programming

Whereas TPL Dataflow is all about passing messages between blocks, Reactive Extensions is about sequences. With sequences we can create projections, transformations and filters. We can combine multiple sequences into a single one. It is a very flexible and powerful paradigm but with such power comes extra complexity. I find TPL Dataflow easier to reason about due to its simple model. Reactive Extensions can get pretty complex and is not always intuitive, but you can create some elegant solutions with it. It will require some investment in time and tinkering to get a reasonable understanding of it.

Jack Vanlightly

April 19, 2018

Programming

Processing Pipelines Series - TPL Dataflow - Alternate Scenario

Jack Vanlightly

April 19, 2018

Programming

In the last post we built a TPL Dataflow pipeline based on the scenario from our first post in the series. Today we'll build another pipeline very similar to the first but with different requirements around latency and data loss.

In the first scenario we could not slow down the producer as slowing it down would cause data loss (it read from a bus that would not wait if you weren't there to consume the data). We also cared a lot about ensuring the first stage kept up with the producer and successfully wrote every message to disk. The rest was best effort, and we performed load-shedding so as not to slow down the producer.

Jack Vanlightly

April 18, 2018

Programming

Processing Pipelines Series - TPL Dataflow

Jack Vanlightly

April 18, 2018

Programming

TPL Dataflow is a data processing library from Microsoft that came out years ago. It consists of different "blocks" that you compose together to make a pipeline. Blocks correspond to stages in your pipeline. If you didn't read the first post in the series then that might not be a bad idea before you read on.

Jack Vanlightly

April 17, 2018

Programming

Processing Pipelines Series - Concepts

Jack Vanlightly

April 17, 2018

Programming

In this series we'll look at few different .NET technologies we can use to process streams of data in processing pipelines and directed acyclic graphs (DAGs). This is not about distributed data platforms for big data but real-time processing and computation running on a single machine. We'll take a single scenario and build it out multiple times, each with a different technology. Each application will be built as a console application with .NET Core.