一. 参考
二. 前言
pt-table-checksum是一个在线验证主从数据一致性的工具,主要用于以下场景:
1. 数据迁移前后,进行数据一致性检查
2. 当主从复制出现问题,待修复完成后,对主从数据进行一致性检查
3. 把从库当成主库,进行数据更新,产生了"脏数据"
4. 定期校验
三. 环境
master
|----slave1
|----slave2
四. 临时校验(端口3306)
- 创建用户checksums,192.168.250.249为当前写库
- GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
select | 查看所有库的表,原理可加 explain选项查看 |
process | show processlist |
super | set binlog_format='statement' |
replication slave | show slave hosts |
连接范围:
允许master连接
- 参数解释
DSN |
h='192.168.250.249',u='checksums',p='xxxx',P=3306 |
-d |
指定校验的库,多个库用逗号(,)分隔 |
--nocheck-replication-filters |
忽略replication-do-db规则 |
--replicate=test.checksums |
在test库下创建checksums表,并将数据写入 |
--recursion-method=none |
指定复制检查的方式,默认情况下使用SHOW PROCESSLIST,如果非标准的3306端口,就使用SHOW SLAVE HOSTS的方式,推荐使用dsn方式,手动指定 |
METHOD USES
=========== ==================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
dsn=DSN DSNs from a table
none Do not find slaves
DSN格式:dsn=h=host,D=pt,t=dsns
- 登录主库,增加授权
- GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
- 登录主库,执行pt-table-checksums脚本
- 只检查mysql库
- pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --nocheck-replication-filters --replicate=test.checksums
- 只检查mysql库,但是过滤mysql.user表
- pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --ignore-tables=mysql.user --nocheck-replication-filters --replicate=test.checksums
- 手动在从库上查询的sql
--chunk-size 指定块的大小,默认是1000条.
- SELECT db, tbl, SUM(this_cnt) AS total_rows, COUNT(*) AS chunks FROM test.checksums WHERE ( master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) GROUP BY db, tbl;
五. 利用脚本做定期任务.
- 建库
- CREATE DATABASE pt CHARACTER SET utf8;
- 授权
- GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx'; GRANT ALL ON pt.* TO 'checksums'@'192.168.250.249' IDENTIFIED BY 'xxxx';
- 在pt库中建表,并插入从库的信息.
- CREATE TABLE `dsns` ( `id` int(11) NOT NULL AUTO_INCREMENT, `parent_id` int(11) DEFAULT NULL, `dsn` varchar(255) NOT NULL, PRIMARY KEY (`id`) );
- -- 写入从库信息
- INSERT INTO dsns (parent_id,dsn) values(1, "h=replica_host,u=checksums,p=password,P=3306");
- -- 如果有多个从库,就插入多条记录.
- -- 也可以按如下简写
- INSERT INTO dsns (parent_id,dsn) values(1, "h=replica_host");
- 全库校验的命令
- ./pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 --nocheck-replication-filters --replicate=pt.checksums --recursion-method=dsn=h=host,D=pt,t=dsns
- 脚本
- #!/bin/bash
- # ########################################################################
- # This program is percona-toolkit shell
- # Version: 1.0 (20120926)
- # Authors: lianjie.ning@qunar.com
- # History:
- # ########################################################################
- set -e
- set -u
- if [ $# -eq 0 ]; then
- echo "Usage:
- $0 -h, help
- $0 -P port -c [pt-table-checksum]"
- exit
- fi
- trap "send_mail" EXIT
- # send mail
- function send_mail()
- {
- ESUBJECT="$HOSTNAME $0"
- EMAIL='xxx@xxx.com'
- if [ -r $LOG ]; then
- cat $LOG |mail -s "$ESUBJECT" $EMAIL
- else
- echo "The log($LOG) is not exist !" |mail -s "$ESUBJECT" $EMAIL
- fi
- rm "$LOG"
- }
- # default variables
- TIME=`date +%Y%m%d%H%M%S`
- LOG="/tmp/$HOSTNAME.checksum.log"
- #CHECKSUM="/usr/local/bin/pt-table-checksum"
- CHECKSUM="/home/lianjie.ning/pt-table-checksum"
- exec 1>>$LOG 2>$LOG
- function checksum()
- {
- master_host="192.168.250.249"
- user="checksums"
- password="xxxx"
- : ${port:="3306"}
- db_table="pt.checksums"
- echo "MySQL Port: $port"
- echo "Verify MySQL replication integrity"
- $CHECKSUM h="$master_host",u="$user",p="$password",P="$port" --nocheck-replication-filters --replicate=$db_table --recursion-method=dsn=D=pt,t=dsns
- }
- # main
- while getopts :P:c arg
- do
- case $arg in
- P)
- port="$OPTARG"
- ;;
- c)
- checksum
- ;;
- :)
- echo "$arg: 错误参数"
- ;;
- \?)
- echo "$arg: 非法选项"
- ;;
- esac
- done
六. 问题处理:
- 在主库上执行
pt-table-checksum h='192.168.250.249',u='checksums',p='xxxx',P=3306 -d mysql --nocheck-replication-filters --replicate=test.checksums
需要注意test.checksums是否存在,如果存在,则手工删除,否则会报如下错误
06-07T16:54:23 User does not have all privileges on --replicate table `test`.`checksums`.
- binlog_format问题(脚本bug)
06-07T19:06:07 Error checksumming table mysql.columns_priv: Error executing checksum query: Checksum query for table mysql.columns_priv caused MySQL error 1592:
Level: Note
Code: 1592
Message: Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by the SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave.
Query: REPLACE INTO `test`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT() AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `host`, `db`, `user`, `table_name`, `column_name`, `timestamp` + 0, `column_priv`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `mysql`.`columns_priv` /*checksum table/
该问题与复制过滤器replicate-do-db 和 pt-table-checksums,innodb_strict_mode = 1均没有关系
脚本的bug,在mysql 5.5.18版本之后,都会出现这个问题
下个新版本,已经修复
~percona-toolkit-dev/percona-toolkit/2.1/view/head:/bin/pt-table-checksum
- 索引问题
06-07T21:19:31 Cannot checksum table db.table: There is no good index and the table is oversized. at ./pt-table-checksum line 3899.
因为该工具是根据主键索引或者唯一索引来分块进行校验的,默认是1000行为一块.
阅读(1048) | 评论(0) | 转发(0) |