In Vueling we have almost 700 applications using hundreds of databases, queues, FTP sites, web services, remote file shares etc. Understanding how everything fits together is a lost battle without visualizations and we model our entire infrastructure in Neo4J.
Exploring the use of Hash Trees for Data Synchronization - Part 1
n this post we'll explore a relational database replication strategy that you can use when standard database replication is not an option – so no replication feature, no log shipping, no mirroring etc. The approaches outlined below will only work with a master-slave model where all writes go to the master. Conflict resolution is not addressed in this article.
We’ll cover phase one of a two-phase approach of
1. Generate and compare hash trees to identify blocks of rows that have discrepancies
2. For each block with a different hash value, identify and import the individual changes (insert, update, delete)
This post is really about exploring the approach rather than looking at the implementation details and detailed performance metrics. Perhaps I might share some code and metrics in a later post if people are interested.