SQL Server

SQL Server CDC to Redshift Pipeline

In this post we'll take a look at what Change Data Capture (CDC) is and how we can use it to get data from SQL Server into Redshift in either a near real-time streaming fashion or more of a batched approach.

CDC is a SQL Server Enterprise feature and so not available to everyone. Also there are vendors that sell automated change data capture extraction and load into Redshift, such as Attunity and that may be your best option. But if you can't or don't want to pay for another tool on top of your SQL Server Enterprise license then this post may help you.

Building Synkronizr - A SQL Server Data Synchronizer Tool - Part 1

Origins of Synkronizr

In a recent post I described a method I recently used at work for synchronizing a SQL Server slave with a master. Because the master is hosted by a partner and that partner does not offer any replication, mirroring or log shipping I opted for a replication technique loosely based on how some distributed NoSQL databases do it - the generation and comparison of hash trees (Merkle Trees).

How Row Locking Makes Taskling Concurrency Controls Possible

Your Taskling jobs can be configured with concurrency limits and those jobs will never have more than the configured number of executions of that job running at any time.

Some batch and micro-batch jobs need to be singletons, there to be only one execution running at any point in time. This may be to avoid data consistency issues when persisting results or because only a single session can be opened to a third party service etc. Other batch processes need more than one execution running at the same time in order to cope with the data volume but have a concurrency limit in order to not overwhelm downstream systems or third party services.