Mysql误删除数据及其bug分析-xyaxlz-ChinaUnix博客

天高任鸟飞xyax.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

xyaxlz

博客访问： 2377025
博文数量： 276
博客积分： 5998
博客等级：大校
技术积分： 5175
用户组：普通用户
注册时间： 2010-12-24 14:43

文章分类

全部博文（276）

hbase（1）

安装部署（1）
zookeeper（1）

安装部署（1）
R（9）

数据可视化（1）

R排查（3）

R安装（5）
kafka（1）

kafka排错（1）
storm（1）

storm维护（1）
hadoop（3）

hadoop开发（3）
mongodb（7）

mongodb故障排查（3）

mongodb原理（1）

mongodb维护（3）
python（2）
sql server（0）
oracle（0）
greenplum（2）
缓存加速（0）
web（7）
调优（14）
集群（0）
监控（8）
shell（20）
安装配置（7）
系统（37）
安全（7）
mysql（142）

mysql 高可（0）

mysql开发（2）

mysql压力测试（5）

mysql知识（41）

mysql维护（53）

mysql故障排查（25）

mysql函数（6）

sql语句优化（2）
未分配的博文（7）

文章存档

2014年（25）

2013年（11）

2012年（69）

2011年（167）

2010年（4）

我的朋友

相关博文

Mysql误删除数据及其bug分析

分类： Mysql/postgreSQL

2012-11-14 15:21:32

一、现状描述
11月25日晚上8点40分接到现场电话，说我们公司所有员工的im不能正常登陆，im数据库服务器（mysql数据库）的公司表中对应的我们公司名称被删除，需要处理。接到这个异常后，第一想到的是下午下班前，收到一封ogg的警告邮件说ogg进程终止，然后我登陆数据库查看的时候，发现ogg已经工作正常，所以也就没有太多关注。既然已经出现了这个问题，那么先解决问题，再分析原因。因为这些都是ogg从oracle端同步过来的一些数据，所以直接从oracle那边初始化一份过来，然后重设同步程序就可以了。

二、错误分析
1、出现这个问题，第一想到的就是binlog，因为公司表的数据是从我们oracle那边同步过去的，而且oracle那边没有任何关于这个表的删除操作，所以我定位这个表的delete操作

[mysql@ezgclient mysqllog]$ mysqlbinlog mysqlbin.000150>/tmp/11_25.txt [mysql@ezgclient mysqllog]$ vi /tmp/11_25.txt …… # at 1396789 # at 1396875 #111125 16:15:31 server id 2 end_log_pos 1396875 Table_map: `a`.`abc` mapped to number 5304 #111125 16:15:31 server id 2 end_log_pos 1397000 Delete_rows: table id 5304 flags: STMT_END_F BINLOG ' o07PThMCAAAAVgAAAItQFQAAALgUAAAAAAAABnNydGFpcgAPdGFiX3V1bV9jb21wYW55AA4DD/b2 CAwMCA8PDw8PCBAAARQAFgCAAIAAQABAAEAA+D8= o07PThkCAAAAfQAAAAhRFQAQALgUAAAAAAEADv//AP7iAAAAFAC6vNbdysC1vL/GvLzT0M/euavL voAAAAAAOLkeQYAAAAAAAAAAAAEpAAAAAAAAANkqumA8EgAAO8kaakoSAAABAAAAAAAAAA9CVTAw MDAwMDAzNTgwNzg= ……

2、因为binlog_format采用的是row模式，所以需要进一步解析binglog

[mysql@ezgclient mysqllog]$ mysqlbinlog -v -v mysqlbin.000150>/tmp/11_25.txt [mysql@ezgclient mysqllog]$ vi /tmp/11_25.txt #111125 16:15:31 server id 2 end_log_pos 1396875 Table_map: `a`.`abc` mapped to number 5304 #111125 16:15:31 server id 2 end_log_pos 1397000 Delete_rows: table id 5304 flags: STMT_END_F BINLOG ' o07PThMCAAAAVgAAAItQFQAAALgUAAAAAAAABnNydGFpcgAPdGFiX3V1bV9jb21wYW55AA4DD/b2 CAwMCA8PDw8PCBAAARQAFgCAAIAAQABAAEAA+D8= o07PThkCAAAAfQAAAAhRFQAQALgUAAAAAAEADv//AP7iAAAAFAC6vNbdysC1vL/GvLzT0M/euavL voAAAAAAOLkeQYAAAAAAAAAAAAEpAAAAAAAAANkqumA8EgAAO8kaakoSAAABAAAAAAAAAA9CVTAw MDAwMDAzNTgwNzg= '/*!*/; ### DELETE FROM a.abc ### WHERE ### @1=226 /* INT meta=0 nullable=0 is_null=0 */ ### @2='*****有限公司' /* VARSTRING(256) meta=256 nullable=0 is_null=0 */ ### @3=951656001 /* DECIMAL(20,0) meta=5120 nullable=0 is_null=0 */ ### @4=000000001 /* DECIMAL(22,0) meta=5632 nullable=1 is_null=0 */ ### @5=41 /* LONGINT meta=0 nullable=1 is_null=0 */ ### @6=2005-05-30 15:11:29 /* DATETIME meta=0 nullable=1 is_null=0 */ ### @7=2011-08-17 02:02:19 /* DATETIME meta=0 nullable=1 is_null=0 */ ### @8=1 /* LONGINT meta=0 nullable=1 is_null=0 */ ### @9='BU0000000358078' /* VARSTRING(128) meta=128 nullable=1 is_null=0 */ ### @10=NULL /* VARSTRING(128) meta=128 nullable=1 is_null=1 */ ### @11=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */ ### @12=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */ ### @13=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */ ### @14=NULL /* VARSTRING(128) meta=0 nullable=1 is_null=1 */ # at 1397000 #111125 16:15:31 server id 2 end_log_pos 1397027 Xid = 79238866 COMMIT/*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */;

通过上面的分析和这里的日志情况显示，很明显有人误删除了这条记录，导致我们公司所有员工不能登录im(登录在线的，不会使用到这条记录，这个也就是导致了我们到晚上八点多才发现这个异常)

3、检查error日志
检查这个日志，发现一个很明显的bug,这个是导致数据库重启，以及那个时间因为数据库重启导致ogg进程异常收到邮件

111125 16:15:35 InnoDB: Assertion failure in thread 1095162176 in file row/row0mysql.c line 1534 InnoDB: Failing assertion: index->type & DICT_CLUSTERED InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html InnoDB: about forcing recovery. 111125 16:15:35 - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=8388608 read_buffer_size=1048576 max_used_connections=30 max_threads=1000 threads_connected=14 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 9234379 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. thd: 0xb8bf170 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x4146d100 thread_stack 0x30000 /opt/mysql/product/5.1/bin/mysqld(my_print_stacktrace+0x2e)[0x8a74ce] /opt/mysql/product/5.1/bin/mysqld(handle_segfault+0x322)[0x5dc992] /lib64/libpthread.so.0[0x357980eb10] /lib64/libc.so.6(gsignal+0x35)[0x3578c30265] /lib64/libc.so.6(abort+0x110)[0x3578c31d10] /opt/mysql/product/5.1/bin/mysqld(row_unlock_for_mysql+0x2f2)[0x7f4a52] /opt/mysql/product/5.1/bin/mysqld(row_search_for_mysql+0x22e1)[0x802591] /opt/mysql/product/5.1/bin/mysqld(_ZN11ha_innobase10index_readEPhPKhj16ha_rkey_function+0x192)[0x7724d2] /opt/mysql/product/5.1/bin/mysqld(_ZN7handler16read_range_firstEPK12st_key_rangeS2_bb+0xbe)[0x6caa9e] /opt/mysql/product/5.1/bin/mysqld(_ZN7handler22read_multi_range_firstEPP18st_key_multi_rangeS1_jbP17st_handler_buffer+0xce)[0x6c85be] /opt/mysql/product/5.1/bin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x127)[0x6aa557] /opt/mysql/product/5.1/bin/mysqld[0x6c415d] /opt/mysql/product/5.1/bin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP11st_sql_listyyb+0x86c)[0x66fc5c] /opt/mysql/product/5.1/bin/mysqld(_Z21mysql_execute_commandP3THD+0x38bf)[0x5f03af] /opt/mysql/product/5.1/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)[0x5f25e7] /opt/mysql/product/5.1/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe63)[0x5f3453] /opt/mysql/product/5.1/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f3d16] /opt/mysql/product/5.1/bin/mysqld(handle_one_connection+0x236)[0x5e66d6] /lib64/libpthread.so.0[0x357980673d] /lib64/libc.so.6(clone+0x6d)[0x3578cd3d1d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort... thd->query at 0x2aab4890fcd0 is an invalid pointer thd->thread_id=62259 thd->killed=NOT_KILLED The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 111125 16:15:35 mysqld_safe Number of processes running now: 0 111125 16:15:35 mysqld_safe mysqld restarted InnoDB: Log scan progressed past the checkpoint lsn 0 694228728 111125 16:15:36 InnoDB: Database was not shut down normally! InnoDB: Starting crash recovery. InnoDB: Reading tablespace information from the .ibd files... InnoDB: Restoring possible half-written data pages from the doublewrite InnoDB: buffer... InnoDB: Doing recovery: scanned up to log sequence number 0 694229872 111125 16:15:36 InnoDB: Starting an apply batch of log records to the database... InnoDB: Progress in percents: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 5 6 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 InnoDB: Apply batch completed InnoDB: Last MySQL binlog file position 0 1397027, file name /opt/mysql/mysqldata/mysqllog/mysqlbin.000149 111125 16:15:37 InnoDB: Started; log sequence number 0 694229872 111125 16:15:37 [Note] Recovering after a crash using /opt/mysql/mysqldata/mysqllog/mysqlbin 111125 16:15:37 [Note] Starting crash recovery... 111125 16:15:37 [Note] Crash recovery finished.

而且还有个巧合就是查看binlog发现，DELETE FROM a.abc where ……之后，数据库就因为这个bug自动重启了。

4、网友解释

# /opt/mysql/product/5.1/bin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP11st_sql_listyyb+0x86c)[0x66fc5c] # /opt/mysql/product/5.1/bin/mysqld(_Z21mysql_execute_commandP3THD+0x38bf)[0x5f03af] ---从这信息看是整理簇索引，导致表空间出现损坏 ---分析的情况，你的系统应该正在做一个DELETE操作，而且应该无索引可走，删除的数据量也比较大 ---可能是大量数据被缓存在innodb_buffer_pool_size中，并且其内部有创建自适应的hash索引，因删除数据而不得不重新创建， ---以及你的服务器当时IO出现瓶颈，导致一时无法响应Innodb master thread，而出现问题，并且InnoDB引擎在此方面出现过BUG ---解决版本是5.1.37之后，所以建议使用：5.1.40版本，较稳定

阅读(2843) | 评论(0) | 转发(0) |

上一篇：linux下安装mysql5.0.22

下一篇：trigger向一个表插入数据，并按照一定条件更新另外一个表。

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6