Quantcast
Channel: MySQL Performance Blog » Search Results » consistency check innodb table
Viewing all articles
Browse latest Browse all 39

MySQL replication primer with pt-table-checksum and pt-table-sync

$
0
0

MySQL replication is a process that allows you to easily maintain multiple copies of MySQL data by having them copied automatically from a master to a slave database.

It’s essential to make sure the slave servers have the same set of data as the master to ensure data is consistent within the replication stream. MySQL slave server data can drift from the master for many reasons – e.g. replication errors, accidental direct updates on slave, etc.

Here at Percona Support we highly recommend that our customers periodically run the pt-table-checksum tool to verify data consistency within replication streams. Specifically, after fixing replication errors on slave servers to ensure that the slave has identical data as its master. As you don’t want to put yourself in a situation where you need to failover to a slave server for some reason and you find different data on that slave server.

In this post, I will examine the pt-table-checksum and pt-table-sync tools usage from Percona Toolkit on different replication topologies. We often receive queries from customers about how to run these tools and I hope this post will help.

Percona Toolkit is a free collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex to perform manually.

One of those tools is pt-table-checksum, which works by dividing table rows into chunks of rows. The size of a chunk changes dynamically during the operation to avoid overloading the server. pt-table-checksum has many safeguards including variation into chunk size to make sure queries run in a desired amount of time.

pt-table-checksum verifies chunk size by running EXPLAIN query on each chunk. It also monitors slave server’s continuously in order to make sure replicas not falls too far behind and in this case tool pauses itself to allow slave to catch up. Along with that there are many other safeguards builtin and you can find all the details in this documentation

In my first example case, I am going to run pt-table-checksum against pair of replication servers – i.e. master having only one slave in replication topology. We will run pt-table-checksum tool on master server to verify data integrity on slave and in case If differences found by pt-table-checksum tool we will sync those changes on slave server via pt-table-sync tool.

I have created a dummy table under test database and inserted 10 records on master server as below:

mysql-master> create table dummy (id int(11) not null auto_increment primary key, name char(5)) engine=innodb;
Query OK, 0 rows affected (0.08 sec)
mysql-master> insert into dummy VALUES (1,'a'), (2,'b'), (3,'c'), (4,'d'), (5,'e'), (6,'f'), (7,'g'), (8,'h'), (9,'i'), (10,'j');
Query OK, 10 rows affected (0.00 sec)
Records: 10  Duplicates: 0  Warnings: 0
mysql-master> select * from dummy;
+------+------+
| id   | name |
+------+------+
|    1 | a    |
|    2 | b    |
|    3 | c    |
|    4 | d    |
|    5 | e    |
|    6 | f    |
|    7 | g    |
|    8 | h    |
|    9 | i    |
|   10 | j    |
+------+------+
10 rows in set (0.00 sec)

Then I intentionally deleted a few records from the slave server to make it inconsistent with the master for the purpose of this post.

mysql-slave> delete from dummy where id>5;
Query OK, 5 rows affected (0.03 sec)
mysql-slave> select * from dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)

Now, in this case the master server has 10 records on the dummy table while the slave server has only 5 records missing records from id>5 – we will run pt-table-checksum at this point on the master server to see if the pt-table-checksum tool catches those differences.

[root@master]# pt-table-checksum --replicate=percona.checksums --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-11T18:30:13      0      1       10       1       0   1.044 test.dummy

This needs to be executed on the master. The user and password you specify will be used to not only connect to the master but the slaves as well. You need the following privileges for the pt-table-checksum mysql user:

mysql-master> GRANT REPLICATION SLAVE,PROCESS,SUPER, SELECT ON *.* TO `checksum_user`@'%' IDENTIFIED BY 'checksum_password';
mysql-master> GRANT ALL PRIVILEGES ON percona.* TO `checksum_user`@'%';

Earlier, in pt-table-checksum command, I used –replicate option which writes replication queries to mentioned table percona.checksums. Next I passed  –ignore-databases option which accepts comma separated list of databases to ignore. Moreover, –create-replicate-table and —empty-replicate-table options are “Yes” by default and you can specify both options explicitly if you want to create database table different then percona.checksums.

pt-table-checksum reported 1 DIFF which is number of chunks which are different from master on one or more slaves. You can find details about tabular columns e.g. TS, ERRORS and so on on documentation of pt-table-checksum. After that, I ran next command to identify which table has difference on slave.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases mysql h=localhost,u=checksum_user,p=checksum_password
Differences on slave
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -5 1

In this command I used –replicate-check-only option which only reports the tables with having differences vice versa only checksum differences on detected replicas are printed. It doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

You may also login to the slave and also execute below query to find out which tables have inconsistencies.

mysql-slave> SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks
FROM percona.checksums
WHERE (
master_cnt <> this_cnt
OR master_crc <> this_crc
OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db, tbl;

pt-table-checksum identified test.dummy table is different on slave now we are going to use pt-table-sync tool to synchronize table data between MySQL servers.

[root@slave]# pt-table-sync --print --replicate=percona.checksums --sync-to-master h=localhost,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('6', 'f') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('7', 'g') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('8', 'h') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('9', 'i') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('10', 'j') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum_user dst_db:test dst_tbl:dummy dst_dsn:h=localhost,p=...,u=checksum_user lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:24683 user:root host:slave*/;

I ran the pt-table-sync tool from an opposite host this time i.e. from the slave as I used the –sync-to-master option which treats DSN as slave and syncs to master. Again, pt-table-sync will use the mysql username and password you specify to connect to the slave as well as to its master. –replicate option here examines the specified table to find out the data differences and –print just prints the SQL  (REPLACE queries) not actually executes it.

You may audit the queries before executing to sync data between master/slave.  You may see it printed only missing records on the slave. Once you are happy with the results, you can substitute –print with –execute to do actual synchronization.

As a reminder, these queries always executed on the master as this is the only safe way to do the changes on slave. However, on the master it’s no-op changes as these records already exists on master but then falls to slave via replication stream to sync it with master.

If you find lots of differences on your slave server it may lag during synchronization of those changes. As I mentioned earlier, you can use –print option to go through your queries which are going to be executed to sync slave with master server. I found this post useful if you see a huge difference in the table between master/slave(s).

Note, you may use the –dry-run option initially which only analyzes print information about the sync algorithm and then exits. It shows verbose output; it doesn’t do any changes though. –dry-run parameter will basically instruct pt-table-sync to not actually do the sync, but just perform some checks.

Let me present another replication topology, where the master has two slaves where slave2 is running on non-default port 3307 while master and slave1 running on port 3306. Further, slave2 is out of sync with master and I will show you how to sync slave2 which running on port 3307 with master.

mysql-master> SELECT * FROM dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave1> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  5 | e    |
+----+------+
5 rows in set (0.00 sec)
mysql-slave2> SELECT * FROM test.dummy;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
+----+------+

Let’s run pt-table-checksum tool on master database server.

[root@master]# pt-table-checksum --replicate percona.checksums --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
07-23T13:57:39      0      0        2       1       0   0.310 percona.dsns
07-23T13:57:39      0      1        5       1       0   0.036 test.dummy

I used –recursion-method parameter this time which is method to use find slaves in replication stream and it’s pretty useful when your servers run on non-standard port i.e. other than 3306. I created dsns table under percona database with following entries. You may find dsns table structure in documentation.

mysql> SELECT * FROM dsns;
+----+-----------+------------------------------------------------------------+
| id | parent_id | dsn                                                        |
+----+-----------+------------------------------------------------------------+
|  1 |         1 | h=192.168.0.134,u=checksum_user,p=checksum_password,P=3306 |
|  2 |         2 | h=192.168.0.132,u=checksum_user,p=checksum_password,P=3307 |
+----+-----------+------------------------------------------------------------+

Next I ran below pt-table-checksum command to identify which slave server has differences on test.dummy table.

[root@master]# pt-table-checksum --replicate=percona.checksums --replicate-check-only --ignore-databases=mysql h=192.168.0.130,u=checksum_user,p=checksum_password --recursion-method=dsn=D=percona,t=dsns
Differences on slave2
TABLE CHUNK CNT_DIFF CRC_DIFF CHUNK_INDEX LOWER_BOUNDARY UPPER_BOUNDARY
test.dummy 1 -2 1

This shows that slave2 has different data on test.dummy table as compared to the master. Now let’s run pt-table-sync tool to sync those differences and make slave2 identical as the master.

[root@slave2] ./pt-table-sync --print --replicate=percona.checksums --sync-to-master h=192.168.0.132,u=checksum_user,p=checksum_password
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('4', 'd') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;
REPLACE INTO `test`.`dummy`(`id`, `name`) VALUES ('5', 'e') /*percona-toolkit src_db:test src_tbl:dummy src_dsn:P=3306,h=192.168.0.130,p=...,u=checksum dst_db:test dst_tbl:dummy dst_dsn:h=192.168.0.132,p=...,u=checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:1514 user:root host:slave2*/;

It shows 2 rows are different on slave2. Substituting –print with –execute synchronized the differences on slave2 and re-running pt-table-checksum tool shows no more differences.

Conclusion:
pt-table-checksum and pt-table-sync are the finest tools from Percona Toolkit to validate data between master/slave(s). With the help of these tools you can easily identify data drifts and fix them. I mentioned a couple of replication topologies above about how to check replication consistency and how to fix it in case of data drift. You may script pt-table-checksum / pt-table-sync steps and cron checksum script to periodically check the data consistency within replication stream.

This procedure is only safe for a single level master-slave(s) hierarchy. I will discuss the procedure for other topologies in future posts – i.e. I will describe more complex scenarios on how to use these tools in chain replication i.e. master -> slave1 -> slave2 pair and in Percona XtraDB Cluster setup.

The post MySQL replication primer with pt-table-checksum and pt-table-sync appeared first on MySQL Performance Blog.


Viewing all articles
Browse latest Browse all 39

Trending Articles