Incremental Database Synchronization
One of the most important database management tasks for distributed heterogeneous systems is synchronization of data between multiple sources and targets. There is big challenge for companies and organizations to achieve a near to real-time update of the modified data to avoid interruption of businesses flows.
The straightforward approach to database synchronization is based on a overall scanning of both source and target databases to find all changes. After that phase database administrator must implement necessary actions to keep the data up to date either manual or automated way:
- Insert all rows from data source that were missing in the destination
- Delete all rows in the destination tables that do not exist in the source tables
- Update all modified rows in the destination tables with the source data
The bottleneck of this approach is that full synchronization of large databases can take extremely long time. For example, we need to synchronize SQL Server and MySQL databases, both having over 1 million rows. Now, assuming that the average performance of traditional synchronization is 100 rows/ sec. It cannot be essentially improved by deploying high performance components for data access such as bulk reading and writing techniques.
Fortunately, there is another approach to database synchronization task. You can dramatically decrease duration of the procedure by processing modified rows only or another words – data that has been changed since the last synchronization. This method can be implemented via the following steps:
- In the first run, the program performs traditional database synchronization as mentioned above
- Additionally, the program generates a trigger on insert, delete and even for the updates of all tables that’s being synchronized. Every triggers write information about the modified row into a special service table
- On the next runs the synchronization tool extracts updated rows using the information stored in service table and updates the corresponding data only
This method is known as incremental/trigger-based database synchronization. Now, let’s see how incremental process could reduce database synchronization time. Unlike overall scanning synchronization, incremental method could be executed in a near to real time mode if it is scheduled properly. Also, performance can be further increased by executing bulk synchronization as all modified data is accumulated in one place and grouped by the type of operation – insert, delete & update. The bulk rows method of data processing yields over 10 times higher than the synchronization process.
Incremental/trigger-based database synchronization method simply applies the following requirements:
- Access to synchronized databases which have sufficient privileges for creating triggers and tables
- Every synchronized tables must have properly defined primary key
To try the overall advantages of incremental sync method, you may want to evaluate SQL Server – MySQL Sync tool that implements trigger-based synchronization between MS SQL and MySQL, MariaDB or Percona.