Python Script for Comparing MySQL Databases and Finding Discrepancies

Introduction

In the world of database management, comparing MySQL databases is essential for maintaining data accuracy. This process helps identify missing tables, columns, and data discrepancies. In this article, we'll explore how to compare MySQL databases effectively to ensure data integrity.

Let's get started:

Import the necessary libraries

pip install pymysql
import pymysql

Get details of the databases and create a dictionary to store the database information.

detail1 = {"host":"hostlink1",
             "user":"username1",
             "password":"password1",
             "database":"databasename1",
             "connect_timeout":5}
detail2 = {"host":"hostlink2",
             "user":"username2",
             "password":"password2",
             "database":"databasename2",
             "connect_timeout":5}

Connect to the databases

conn1 = pymysql.connect(**detail1)
cursor1 = conn1.cursor()

conn2 = pymysql.connect(**detail2)
cursor2 = conn2.cursor()

Get the tables of both databases

cursor1.execute("SHOW FULL TABLES")
dbtables1 = set(cursor1.fetchall())

cursor2.execute("SHOW FULL TABLES")
dbtables2 = set(cursor2.fetchall())

Let's create strings for a html file and add DataTables javascript library

script = """
        <script src="https://code.jquery.com/jquery-3.7.1.js"></script>
        <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/2.0.0/css/dataTables.dataTables.css">

        <script src="https://cdn.datatables.net/2.0.0/js/dataTables.js"></script>
        <script>
        $(document).ready(function () {
            $('#example').DataTable();
        });
        </script>
            """

html = ""
html += "<table id='example' class='display' style='width:100%'>"
html += f"<thead><tr><th>Name</th><th>{detail1['database']}</th><th>{detail2['database']}</th></tr></thead><tbody>"

Let's compare, add missing tables, and include any absent columns in the common tables to our HTML string.

Combining the tables from two databases into a single set:

alltables = dbtables1.union(dbtables2)

This line creates a set alltables that contains the union of tables from dbtables1 and dbtables2.

Iterating through each table:

for table in alltables:

This loop iterates over each table in the combined set of tables.

Checking presence of table in each database:

presence_count = [
    1 if table in dbtables1 else 0,
    1 if table in dbtables2 else 0
]

Here, presence_count is a list containing 1 if the table is present in the respective database (dbtables1 or dbtables2), otherwise 0.

Checking if the table is missing in one of the databases:

if sum(presence_count) < 2:

This condition checks if the sum of presence_count is less than 2, meaning the table is missing in one of the databases.

Creating HTML table row indicating presence/absence of the table:

html += f'<tr><td><b>Table</b>  {table[0]}</td>'
for count in presence_count:
    html += f'<td>{"present" if count == 1 else "absent"}</td>'
html += "</tr>"

If the table is missing in one of the databases, this code generates an HTML table row indicating whether the table is present or absent in each database.

Fetching column details from each database and comparing them:

elif sum(presence_count) == 2:

If the table is present in both databases, this section fetches column details from each database and compares them to identify any differences.

The subsequent code within this block performs the following tasks:

Fetches column details from both databases using DESCRIBE queries.

cursor1.execute(f"DESCRIBE {detail1['database']}.{table[0]}")
column_set1 = set(cursor1.fetchall())

cursor2.execute(f"DESCRIBE {detail2['database']}.{table[0]}")
column_set2 = set(cursor2.fetchall())

Compares the column sets to identify columns present in one database but not the other.

# Get the set of all column indexes present in either database
column_set1_indexes = {t[0] for t in column_set1}
column_set2_indexes = {t[0] for t in column_set2}
all_column_indexes = column_set1_indexes.union(column_set2_indexes)

# Tuple of column attributes
maintuple = ("Field", "Type", "Null", "Key", "Default", "Extra")

# Check for columns present in detail1['database'] but not in detail2['database']
for index in all_column_indexes - column_set2_indexes:
    tuple1 = next((t for t in column_set1 if t[0] == index), None)
    if tuple1:
        html += f"<tr><td><b>{table[0]}</b>.{tuple1[0]}</td><td>Present</td><td>Absent</td></tr>"

# Check for columns present in detail2['database'] but not in detail1['database']
for index in all_column_indexes - column_set1_indexes:
    tuple2 = next((t for t in column_set2 if t[0] == index), None)
    if tuple2:
        html += f"<tr><td><b>{table[0]}</b>.{tuple2[0]}</td><td>Absent</td><td>Present</td></tr>"

Compares matching columns for any differences in attributes.

# Check for matching columns and mismatches
for index in all_column_indexes.intersection(column_set1_indexes, column_set2_indexes):
    tuple1 = next((t for t in column_set1 if t[0] == index), None)
    tuple2 = next((t for t in column_set2 if t[0] == index), None)

    if tuple1 and tuple2:
        if tuple1[1:] != tuple2[1:]:
            for i in range(1, len(tuple1)):
                if tuple1[i] != tuple2[i]:
                    html += f"<tr><td><b>{table[0]}</b>.{tuple1[0]}.{maintuple[i]}</td><td>{tuple1[i]}</td><td>{tuple2[i]}</td></tr>"

Completing the HTML table:

html += "</tbody></table>"

This line completes the HTML table by closing the <tbody> and <table> tags.

Overall, this code is designed to compare tables and their columns between two databases and generate an HTML table with the results indicating the presence or absence of tables and any mismatches in column attributes.

Write the html contents to a html file

# Lets open a html file
htmlfile = open("dbdifference.html","w")

# Lets write the html contents to the file 
htmlfile.write(f"""<!DOCTYPE html>
        <html lang="en">
        <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>DB Difference</title>
        </head>
                    {script}
        <body>
    <div style="margin-top: 60px">
        {html}
    </div>               
        </body>
        </html>""")

Conclusion:

In summary, comparing MySQL databases helps ensure data consistency between different environments like development and production. By following the steps outlined above, we can easily identify any discrepancies, such as missing tables, and keep our databases synchronized. This practice is essential for maintaining reliable data management practices and preventing potential issues. Regular comparisons should be part of routine database maintenance to uphold data quality effectively.

Note:

You can also find the code in my github "Database Difference"

Sky's Blog

Sky's Blog

Python Script for Comparing MySQL Databases and Finding Discrepancies

Introduction

Let's get started:

Conclusion: