阿里巴巴DBA,原去哪儿网DBA。专注于MySQL源码研究、DBA运维、CGroup虚拟化及Linux Kernel源码研究等。 github:https://github.com/HengWang/ Email:king_wangheng@163.com 微博 :@王恒-Henry QQ :506437736
分类: Mysql/postgreSQL
2013-01-17 15:46:46
目的
MySQL表结构修改,大多数情况下,需要拷贝原始表到临时表,修改结束后,删除原来的表。该过程会添加表锁,其他对数据表的所有操作,需要等待表结构修改结束后方可执行。如果单表数据量较大时,会造成长时间的表锁。为了解决这个问题,目前有几种在线修改表结构的工具,这些工具不会造成锁表。pt-online-schema-change工具是其中之一,通过测试主要了解该工具的可靠性以及存在的问题。
原理
在线修改表结构的工具,基本处理方式类似,以下对pt-online-schema-change工具的工作原理进行分析:
1、如果存在外键,根据alter-foreign-keys-method参数的值,检测外键相关的表,做相应设置的处理。
2、创建一个新的表,表结构为修改后的数据表,用于从源数据表向新表中导入数据。
3、创建触发器,用于记录从拷贝数据开始之后,对源数据表继续进行数据修改的操作记录下来,用于数据拷贝结束后,执行这些操作,保证数据不会丢失。
4、拷贝数据,从源数据表中拷贝数据到新表中。
5、修改外键相关的子表,根据修改后的数据,修改外键关联的子表。
6、rename源数据表为old表,把新表rename为源表名,并将old表删除。
7、删除触发器。
测试方案
对pt-online-schema-change工具[1]的测试,主要从以下几个方面进行:
1、静态测试
对现有表进行直接修改字段、增加索引操作。当前状况下,没有任何写操作,只有读操作。
2、动态测试
对进行压力测试的数据表修改字段、增加索引操作。当前状况下,写压力较大,并且多线程操作。
3、外键测试
对有外键的数据表进行测试,分别测试外键操作类型分别为:auto、rebuild_constraints、drop_swap、none时,关联表的数据影响。
四种外键操作类型处理
操作类型 |
说明 |
auto |
选择rebuild_constraints或drop_swap两种之一,如果外键表太大,那么使用drop_swap方式,否则使用rebuild_constraints。选择的标准是:--chunk_time和--chunk_size_limit命令参数。 |
rebuild_constraints |
为新表重建外键约束。这种方式更可靠。 |
drop_swap |
首先是外键约束失效,在rename之前,drop原始表。这种方法最快、并且不会阻塞。但是存在两个风险:drop原始表会产生短暂的表不存在,导致查询失败;如果在drop之后,rename发生错误,此时原始表已经删除,导致数据表无法恢复。 |
none |
不进行外键的更新,会导致外键失效。 |
测试数据表为采用TPCC_MySQL工具[2]生成的测试表,动态测试采用TPCC_MySQL进行压力测试,从而验证online修改表对读写的影响。
测试
1、静态测试
对没有外键的静态表warehouse进行测试,测试语句为:
--增加字段 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter="add column w_test varchar(100) default null comment '测试'" D=test,t=warehouse –exec --增加索引 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter=" add index idx_w_test(w_test)" D=test,t=warehouse –exec |
测试过程:
--增加字段 Altering `test`.`warehouse`... Creating new table... Created new table test.__warehouse_new OK. Altering new table... Altered `test`.`__warehouse_new` OK. Creating triggers... Created triggers OK. Copying approximately 20 rows... Copied rows OK. Swapping tables... Swapped original and new tables OK. Dropping old table... Dropped old table `test`.`__warehouse_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`warehouse` --增加索引 Altering `test`.`warehouse`... Creating new table... Created new table test.__warehouse_new OK. Altering new table... Altered `test`.`__warehouse_new` OK. Creating triggers... Created triggers OK. Copying approximately 20 rows... Copied rows OK. Swapping tables... Swapped original and new tables OK. Dropping old table... Dropped old table `test`.`__warehouse_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`warehouse`. |
对有外键的表进行操作,执行以上命令,会报错。详细的外键测试在第三部分测试,具体错误为:
You did not specify --alter-foreign-keys-method, but there are foreign keys that reference the table. Please read the tool's documentation carefully. |
2、动态测试
动态测试过程,首先使用tpcc_start进行压力测试,在测试期间,执行一下SQL语句,查看执行过程。
--增加字段 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter="add column w_test1 varchar(100) default null comment '测试1'" D=test,t=warehouse –exec --增加索引 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter=" add index idx_w_test1(w_test1)" D=test,t=warehouse –exec |
测试过程:
1)首先执行tpcc_start进行压力测试,具体测试条件如下所示,特别注意的是-c参数,表示连接数,该值将影响pt-online-schema-change工具。
./tpcc_start -h 127.0.0.1 -P3333 -d test -uroot -p "" -w 20 -c 100 -r 10 -l 600 -i 1 -f result.txt > output.txt |
2)首先执行命令,发生错误:
Altering `test`.`warehouse`... Creating new table... Created new table test.__warehouse_new OK. Altering new table... Altered `test`.`__warehouse_new` OK. Creating triggers... Created triggers OK. Copying approximately 20 rows... Error copying rows from `test`.`warehouse` to `test`.`__warehouse_new`: Threads_running=97 exceeds its critical threshold 50 Dropping triggers... Dropped triggers OK. Dropping new table... Dropped new table OK. `test`.`warehouse` was not altered. |
出错的原因为线程查出了pt-online-schema-change工具的默认值,需要使用--critical-load参数有关,具体的参数详见文档中的说明。
增加命令参数(--critical-load Threads_runnings=200),即修改命令后为:
--增加字段 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --critical-load Threads_runnings=200 --alter="add column w_test1 varchar(100) default null comment '测试1'" D=test,t=warehouse –exec --增加索引 ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --critical-load Threads_runnings=200 --alter="add index idx_w_test1(w_test1)" D=test,t=warehouse –exec |
测试过程:
--增加字段 Altering `test`.`warehouse`... Creating new table... Created new table test.__warehouse_new OK. Altering new table... Altered `test`.`__warehouse_new` OK. Creating triggers... Created triggers OK. Copying approximately 20 rows... Pausing because Threads_running=101. Pausing because Threads_running=73. Pausing because Threads_running=83. Pausing because Threads_running=99. Pausing because Threads_running=101. Pausing because Threads_running=101. Pausing because Threads_running=101. Pausing because Threads_running=67. Pausing because Threads_running=101. Pausing because Threads_running=90. Pausing because Threads_running=95. Pausing because Threads_running=90. Pausing because Threads_running=89. Pausing because Threads_running=85. Pausing because Threads_running=93. Pausing because Threads_running=101. Copied rows OK. Swapping tables... Swapped original and new tables OK. Dropping old table... Dropped old table `test`.`__warehouse_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`warehouse`. --增加索引 Altering `test`.`warehouse`... Creating new table... Created new table test.__warehouse_new OK. Altering new table... Altered `test`.`__warehouse_new` OK. Creating triggers... Created triggers OK. Copying approximately 20 rows... Pausing because Threads_running=74. Pausing because Threads_running=88. Pausing because Threads_running=86. Pausing because Threads_running=86. Pausing because Threads_running=101. Pausing because Threads_running=101. Pausing because Threads_running=101. Pausing because Threads_running=97. Pausing because Threads_running=101. Pausing because Threads_running=88. Pausing because Threads_running=101. Pausing because Threads_running=93. Pausing because Threads_running=101. Pausing because Threads_running=101. Pausing because Threads_running=100. Pausing because Threads_running=101. Copied rows OK. Swapping tables... Swapped original and new tables OK. Dropping old table... Dropped old table `test`.`__warehouse_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`warehouse`. |
从测试过程中可以发现,有很多“Pausing because Threads_running”,这是由于拷贝数据时,该线程正在操作数据。并且,当tpcc测试结束之后,表结构修改才最终结束。因此,从这点来看,当业务量较大时,尽管此时表结构修改不影响数据表的正常服务,但是表结构修改会经历很长时间的等待。
3、外键测试
外键测试主要根据操作类型,分别为:auto、rebuild_constraints、drop_swap、none四种测试。测试命令如下所示:
--操作类型auto ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter-foreign-keys-method=auto --alter="add column c_test1 varchar(20) default null comment '测试1'" D=test,t=customer –exec --操作类型rebuild_constraints ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter-foreign-keys-method=rebuild_constraints --alter="add column c_test2 varchar(20) default null comment '测试2'" D=test,t=customer –exec --操作类型drop_swap ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter-foreign-keys-method=drop_swap --alter="add column c_test2 varchar(20) default null comment '测试2'" D=test,t=customer –exec --操作类型none ./pt-online-schema-change --user=root --host=127.0.0.1 --port=3333 --charset=utf8 --alter-foreign-keys-method=none --alter="add column c_test2 varchar(20) default null comment '测试2'" D=test,t=customer –exec |
测试过程:
--操作类型auto Altering `test`.`customer`... Creating new table... Created new table test._customer_new OK. Altering new table... Altered `test`.`_customer_new` OK. Creating triggers... Created triggers OK. Copying approximately 599160 rows... Copied rows OK. Max rows for the rebuild_constraints method: 60172 Determining the method to update foreign keys... `test`.`history`: too many rows: 600425; must use drop_swap Drop-swapping tables... Dropped and swapped tables OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`customer`. --操作类型rebuild_constraints Child tables: `test`.`history` (approx. 600748 rows) `test`.`history` (approx. 600748 rows) `test`.`history` (approx. 600748 rows) `test`.`orders` (approx. 600013 rows) `test`.`orders` (approx. 600013 rows) `test`.`orders` (approx. 600013 rows) Will use the rebuild_constraints method to update foreign keys. Altering `test`.`customer`... Creating new table... Created new table test._customer_new OK. Altering new table... Altered `test`.`_customer_new` OK. Creating triggers... Created triggers OK. Copying approximately 594759 rows... Copied rows OK. Swapping tables... Swapped original and new tables OK. Rebuilding foreign key constraints... `test`.`history` has no foreign key constraints referencing `test`.`_customer_old`. `test`.`history` has no foreign key constraints referencing `test`.`_customer_old`. `test`.`orders` has no foreign key constraints referencing `test`.`_customer_old`. `test`.`orders` has no foreign key constraints referencing `test`.`_customer_old`. Rebuilt foreign key constraints OK. Dropping old table... Dropped old table `test`.`_customer_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`customer`. --操作类型drop_swap Child tables: `test`.`history` (approx. 599546 rows) `test`.`history` (approx. 599546 rows) `test`.`history` (approx. 599546 rows) `test`.`orders` (approx. 601999 rows) `test`.`orders` (approx. 601999 rows) `test`.`orders` (approx. 601999 rows) Will use the drop_swap method to update foreign keys. Altering `test`.`customer`... Creating new table... Created new table test._customer_new OK. Altering new table... Altered `test`.`_customer_new` OK. Creating triggers... Created triggers OK. Copying approximately 600477 rows... Copied rows OK. Drop-swapping tables... Dropped and swapped tables OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`customer`. --操作类型none Child tables: `test`.`history` (approx. 1427067 rows) `test`.`history` (approx. 1427067 rows) `test`.`history` (approx. 1427067 rows) `test`.`orders` (approx. 1710671 rows) `test`.`orders` (approx. 1710671 rows) `test`.`orders` (approx. 1710671 rows) Will not update foreign keys. Altering `test`.`customer`... Creating new table... Created new table test._customer_new OK. Altering new table... Altered `test`.`_customer_new` OK. Creating triggers... Created triggers OK. Copying approximately 514635 rows... Copying `test`.`customer`: 56% 00:24 remain Copying `test`.`customer`: 98% 00:00 remain Copied rows OK. Swapping tables... Swapped original and new tables OK. Not updating foreign keys because --alter-foreign-keys-method=none. Foreign keys that reference the table will no longer work. Dropping old table... Dropped old table `test`.`_customer_old` OK. Dropping triggers... Dropped triggers OK. Successfully altered `test`.`customer`. |
通过测试可知,当--alter-foreign-keys-method=none时,外键索引将失效,需要重建外键索引,方可再次使用。
结论
通过测试,pt-online-schema-change工具对于线上数据表结构修改,可以实时操作,而不影响线上业务的操作。然而,该工具仍然存在一些问题,需要根据具体的操作,执行操作:
1、当业务量较大时,修改操作会等待没有数据修改后,执行最后的rename操作。因此,在修改表结构时,应该尽量选择在业务相对空闲时,至少修改表上的数据操作较低时,执行较为妥当。
2、如果对外键表操作时,四种外键操作类型需要根据表的数据量和可靠程度,进行选择。处于可靠性的原因,尽量使用rebuild_constraints类型,如果没有可靠性要求,可以使用auto类型。
3、由于可能存在一定的风险,在操作之前,建议对数据表进行备份,可以使得操作更安全、可靠。
参考
1、《Percona Toolkit Documentation》--
2、《TPCC-MySQL使用说明》--http://blog.chinaunix.net/uid-26896862-id-3188313.html