Jack Vanlightly

Data

Exploring the use of Hash Trees for Data Synchronization - Part 1

n this post we'll explore a relational database replication strategy that you can use when standard database replication is not an option – so no replication feature, no log shipping, no mirroring etc. The approaches outlined below will only work with a master-slave model where all writes go to the master. Conflict resolution is not addressed in this article.

We’ll cover phase one of a two-phase approach of
1.    Generate and compare hash trees to identify blocks of rows that have discrepancies
2.    For each block with a different hash value, identify and import the individual changes (insert, update, delete)
This post is really about exploring the approach rather than looking at the implementation details and detailed performance metrics. Perhaps I might share some code and metrics in a later post if people are interested.