MySQL多线程导入导出工具Mydumper
今天从线上使用mysqldump将数据表从一个库导入到另外一个库,结果导出速度超级慢.于是网上搜了搜发现这个东东.测试之后发现还不错.分享给大家.
简单介绍一下
Mydumper是一个使用C语言编写的多线程导出导入工具,并且能够保证多个表之间的一致性.当然不是线程越多越好(这个跟服务器的配置等诸多因素有关,只能作为一个经验值而不是绝对值,机器好的时候,线程越多越好).
原理
前面提到保持数据一致性如何实现呢?
下面是官方给出的解答
主要是使用flush tables with read lock和start transaction with consistent snapshot,在flush tables with read lock时开启所有的线程,并且通过show master status和show slave status获得当前的position(便于使用Mydumper重建slave以及确保多个表之间的数据一致性)
原版如下:
This is all done following best MySQL practices and traditions:
1. Global write lock is acquired ("FLUSH TABLES WITH READ LOCK")
2. Various metadata is read ("SHOW SLAVE STATUS","SHOW MASTER STATUS")
3. Other threads connect and establish snapshots ("START TRANSACTION WITH CONSISTENT SNAPSHOT")
3.1. On pre-4.1.8 it creates dummy InnoDB table, and reads from it.
4. Once all worker threads announce the snapshot establishment, master executes "UNLOCK TABLES" and starts queueing jobs.
安装:
sudo yum install -y gcc gcc-c++ glib2-devel mysql-devel zlib-devel pcre-devel
cmake
make;make install
安装完会生产两个文件
[mysql@localhost ~]$ ls /usr/local/bin/
mydumper myloader
例子:
导出
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 1048576 |
+----------+
1 row in set (0.41 sec)
[mysql@localhost bin]$ ./mydumper -u root -p 'xxxxxxxx' -t 4 -B test -T test -c --less-locking -o /home/mysql/
[mysql@localhost bin]$ ls /home/mysql/
test.test-schema.sql.gz test.test.sql.gz
参数详解
-u "用户"
-p "密码"
-t "指定并行数,默认是4"
-B "指定DB"
-T "指定表"
-c "压缩"
--less-locking "尽量减少锁表锁定时间(针对InnoDB)"
-o "指定目录"
例如:
设置长查询的上限,如果存在比这个还长的查询则退出mydumper,也可以设置杀掉这个长查询
mydumper -u root -p 'xxxx' --long-query-guard 400 --kill-long-queries
通过regex设置正则表达,需要设置db名字
mydumper -u root -p 'xxxx' --regex=test.name
导入
mysql> drop table test;
Query OK, 0 rows affected (0.26 sec)
mysql> exit
Bye
[mysql@localhost bin]$./myloader -u root -p 'xxxxx' -B test -d /home/mysql/
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| test |
+----------------+
1 row in set (0.00 sec)
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 1048576 |
+----------+
1 row in set (0.39 sec)
结论:
Mydumper在导出导入过程因为可以多线程进行,因此速度上优于mysqldump.(题外话:导出上亿表数据的时候速度提升明显啊.^_^太爽了.大爱这个东东)
注意点:
--no-locks参数
这个参数官方给出英文注释"Do not execute the temporary shared read lock. WARNING: This will cause inconsistent backups"也就是会导致备份不一致.
在导出myisam表时有表锁.所以先处理myisam表,记录myisam表个数,并在myisam表都处理完毕后,要立即解锁.尽量减少锁定的时间.
阅读(5228) | 评论(0) | 转发(1) |