Jack Vanlightly

C#

SQL Server CDC to Redshift Pipeline

In this post we'll take a look at what Change Data Capture (CDC) is and how we can use it to get data from SQL Server into Redshift in either a near real-time streaming fashion or more of a batched approach.

CDC is a SQL Server Enterprise feature and so not available to everyone. Also there are vendors that sell automated change data capture extraction and load into Redshift, such as Attunity and that may be your best option. But if you can't or don't want to pay for another tool on top of your SQL Server Enterprise license then this post may help you.

Processing Pipelines Series - Reactive Extensions (Rx.NET)

Whereas TPL Dataflow is all about passing messages between blocks, Reactive Extensions is about sequences. With sequences we can create projections, transformations and filters. We can combine multiple sequences into a single one. It is a very flexible and powerful paradigm but with such power comes extra complexity. I find TPL Dataflow easier to reason about due to its simple model. Reactive Extensions can get pretty complex and is not always intuitive, but you can create some elegant solutions with it. It will require some investment in time and tinkering to get a reasonable understanding of it.

Processing Pipelines Series - TPL Dataflow - Alternate Scenario

In the last post we built a TPL Dataflow pipeline based on the scenario from our first post in the series. Today we'll build another pipeline very similar to the first but with different requirements around latency and data loss.

In the first scenario we could not slow down the producer as slowing it down would cause data loss (it read from a bus that would not wait if you weren't there to consume the data). We also cared a lot about ensuring the first stage kept up with the producer and successfully wrote every message to disk. The rest was best effort, and we performed load-shedding so as not to slow down the producer.

Processing Pipelines Series - TPL Dataflow

TPL Dataflow is a data processing library from Microsoft that came out years ago. It consists of different "blocks" that you compose together to make a pipeline. Blocks correspond to stages in your pipeline. If you didn't read the first post in the series then that might not be a bad idea before you read on.

Processing Pipelines Series - Concepts

In this series we'll look at few different .NET technologies we can use to process streams of data in processing pipelines and directed acyclic graphs (DAGs). This is not about distributed data platforms for big data but real-time processing and computation running on a single machine. We'll take a single scenario and build it out multiple times, each with a different technology. Each application will be built as a console application with .NET Core.

DSL Parser - Sample Code

I while back I wrote a blog series about DSLs, grammars, tokenizers, parsers and a SQL generator. The idea was that you could write a DSL query to mine your error log data and the code would generate SQL. The series can be found here: http://jack-vanlightly.com/blog/2016/2/3/how-to-create-a-query-language-dsl

The tokenizer while simple was very inefficient, so I wrote a better one, you can find that here: http://jack-vanlightly.com/blog/2016/2/24/a-more-efficient-regex-tokenizer

I have just published working code based on this series and the better Regex tokenizer on Github here: https://github.com/Vanlightly/DslParser.

Reliability - Custom Retries - NServiceBus with RabbitMq Part 6

NServiceBus provides really nice customisation features for your recoverability functionality. We already saw how you can customise the immediate and delayed retries. In this post we'll look at implementing our own custom policy which allows us to make decisions based on the exceptions that are thrown in the message handlers.

The NServiceBus documentation on customising your recoverability policy is well explained and so I recommend you read their documentation first. 

Custom Topology - NServiceBus with RabbitMq Part 4

In this post we'll implement a class that inherits from IRoutingTopology and that plugs into NServiceBus to create a custom routing topology. There is also the IDeclareQueues interface that we could also implement but I doubt I would ever need to. NServiceBus creates a bunch of queues in addition to the main endpoint queue, for things like retries and timeouts. I don't want to mess with these as they are important for how NServiceBus gives us the functionality that it does. So in this post we'll just be customising the RabbitMq artefacts beyond the immediate endpoint queues.

Custom Direct Topology - NServiceBus with RabbitMq Part 3

In this post we'll look at how we can customise the Direct Routing Topology. The source code is on Github.

Create a new solution folder and called it RabbitMqTransportCustomDirectTopology and add the four projects named as follows: Rabbit.Custom.ClientUI, Rabbit.Custom.Sales, Rabbit.Custom.Billing and Rabbit.Custom.Shipping.