一次数据库服务器断电恢复-gun

gun_hap的ChinaUnix博客

首页　| 　博文目录　| 　关于我

gun_hap

博客访问： 644442
博文数量： 110
博客积分： 2352
博客等级：大尉
技术积分： 1253
用户组：普通用户
注册时间： 2012-03-29 14:00

文章分类

全部博文（110）

文章存档

2018年（1）

2013年（20）

2012年（89）

我的朋友

相关博文

一次数据库服务器断电恢复

分类： Mysql/postgreSQL

2018-01-02 17:08:41

一次数据库服务器断电恢复过程
linux版本：CentOS release 6.5 (Final)
mysql版本： mysql> select @@version;
+--------------------------+
| @@version |
+--------------------------+
| 5.5.36-MariaDB-wsrep-log |
+--------------------------+

问题场景：一次机房意外停电，导致多台服务器重新启动，其中有一台数据库服务器已经存在很长时间了，无法正常启动，初期错误日志信息如下：
'171222 2:30:28 InnoDB: The InnoDB memory heap is disabled
171222 2:30:28 InnoDB: Mutexes and rw_locks use GCC atomic builtins
171222 2:30:28 InnoDB: Compressed tables use zlib 1.2.3
171222 2:30:28 InnoDB: Using Linux native AIO
171222 2:30:28 InnoDB: Initializing buffer pool, size = 6.3G
171222 2:30:29 InnoDB: Completed initialization of buffer pool
171222 2:30:29 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 164567636854
171222 2:30:29 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 164568423895
InnoDB: Error: trying to access page number 4294966143 in space 0,
InnoDB: space name ./ibdata1,
InnoDB: which is outside the tablespace bounds.
InnoDB: Byte offset 0, len 16384, i/o type 10.
InnoDB: If you get this error at mysqld startup, please check that
InnoDB: your my.cnf matches the ibdata files that you have in the
InnoDB: MySQL server.
171222 2:30:30 InnoDB: Assertion failure in thread 139799947073504 in file fil0fil.c line 5462
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
171222 2:30:30 [ERROR] mysqld got signal 6 ;

恢复过程如下，把datadir目录重新复制一份，进行恢复，如下是恢复的详细过程。
1.使用命令正常启动
180102 14:34:38 InnoDB: Completed initialization of buffer pool
InnoDB: Error: log file ./ib_logfile0 is of different size 0 1679818752 bytes
InnoDB: than specified in the .cnf file 0 268435456 bytes!
180102 14:34:38 [ERROR] Plugin 'InnoDB' init function returned error.
180102 14:34:38 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
180102 14:34:38 [Note] Plugin 'FEEDBACK' is disabled.
180102 14:34:38 [ERROR] Unknown/unsupported storage engine: InnoDB
180102 14:34:38 [ERROR] Aborting
180102 14:34:38 [Note] /usr/libexec/mysqld: Shutdown complete'
从日志信息看，是ib_logfile0有问题了，可以理解为redo日志出问题了。

2.把ib_logfile日志重命名保留，重新启动数据库，让期重新使用新的redo日志，重新启动时报如下错误，此次错误主要是报mysql sinal 11错误

mv ib_logfile0 bak_ib_logfile0

mv ib_logfile1 bak_ib_logfile1

细心观察时，发现此次数据库重启动时，在创建完ib_logfile日志后，就开始进行了数据恢复，也就是creash recovery.
180102 14:39:30 InnoDB: Page checksum 1808068616 (32bit_calc: 4064195506), prior-to-4.0.14-form checksum 3308157180
InnoDB: stored checksum 1808068616, prior-to-4.0.14-form stored checksum 3308157180
InnoDB: Page lsn 38 1354583575, low 4 bytes of lsn at page end 1354583575
InnoDB: Page number (if stored to page already) 535,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 69
InnoDB: Page may be a BLOB page
180102 14:39:30 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 5.5.36-MariaDB-wsrep-log
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=0
max_threads=1026
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2267738 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x30000
/usr/libexec/mysqld(my_print_stacktrace+0x2e)[0x7f769b4ae8ee]
/usr/libexec/mysqld(handle_fatal_signal+0x4ac)[0x7f769b0ee33c]
/lib64/libpthread.so.0(+0xf710)[0x7f769a7f9710]
/usr/libexec/mysqld(+0x7374c9)[0x7f769b3604c9]
/usr/libexec/mysqld(+0x6999aa)[0x7f769b2c29aa]
/usr/libexec/mysqld(+0x6478e2)[0x7f769b2708e2]
/usr/libexec/mysqld(+0x753623)[0x7f769b37c623]
/usr/libexec/mysqld(+0x754aac)[0x7f769b37daac]
/usr/libexec/mysqld(+0x74b497)[0x7f769b374497]
/usr/libexec/mysqld(+0x6691e7)[0x7f769b2921e7]
/usr/libexec/mysqld(+0x65ff1c)[0x7f769b288f1c]
/lib64/libpthread.so.0(+0x79d1)[0x7f769a7f19d1]
/lib64/libc.so.6(clone+0x6d)[0x7f769910c86d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.'
从上面的日志信息中可谓看到，数据库恢复失败，而同时数据库也启动失败。

3.于是尝试使用innodb_force_recovery进行数据库的恢复启动，首先使用最低级别的恢复，跳过坏页的方法进行数据库启动
innodb_force_recovery=1
但是整个启动过程依然是失败，所报错误信息和上面的一致，同样遇到了singal 11错误。

4.尝试添加innodb_purge_threads参数，再次进行启动，如下所示：
innodb_force_recovery=1
innodb_purge_threads = 0
本次启动，遭遇了一个比较奇怪的问题，数据库一直在处于循环重启动的过程中，日志如下所示：
180102 15:00:11 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql4000/wsrep_recovery.aFZ19U' --pid-file='/var/lib/mysql4000/controller-recover.pid'
180102 15:00:12 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
180102 15:00:14 mysqld_safe Number of processes running now: 0
180102 15:00:14 mysqld_safe mysqld restarted
180102 15:00:14 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql4000/wsrep_recovery.3jbUnM' --pid-file='/var/lib/mysql4000/controller-recover.pid'
180102 15:00:15 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
180102 15:00:17 mysqld_safe Number of processes running now: 0
180102 15:00:17 mysqld_safe mysqld restarted
180102 15:00:17 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql4000/wsrep_recovery.3H1iqG' --pid-file='/var/lib/mysql4000/controller-recover.pid'
180102 15:00:18 mysqld_safe WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
180102 15:00:20 mysqld_safe Number of processes running now: 0
180102 15:00:20 mysqld_safe mysqld restarted
而且有一个问题一直没想明白，就是遇到WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1

5.尝试提升innodb_force_recovery的级别，再次进行重启动
innodb_force_recovery=2
innodb_purge_threads = 0
虽然还会发现日志序号的问题，如下所示：
180102 15:05:59 InnoDB: Error: page 567 log sequence number 4076805
InnoDB: is in the future! Current system log sequence number 9228.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
但是这一次数据库能启动起来了，而且数据库可以使用用户和密码登录，而且已经能使用mysqldump进行数据的导出了。

6.再次提升 innodb_force_recovery的级别，再次进行重启动.
innodb_force_recovery=3-6，均能启动和登录，但是都需要进行数据导出。
注意，此处的innodb_force_recovery一定要配合innodb_purge_threads使用，否则数据库启动后，无法正常登录,这个不排除和使用的数据库版本有关系。

阅读(6704) | 评论(2) | 转发(0) |

上一篇：Shell中如何做到只显示文本内容的偶数行

下一篇：没有了

给主人留下些什么吧！~~

24259138642018-03-23 14:30:41

嗯，我们也是做服务器租售的客户遇到这种问题我们也是感到很糟心。我们也为应对这种情况做了很多措施http://www.hkt4.com/news/220.html

回复 | 举报

onlyword2018-01-03 15:49:31

学习了，遇到机房断电真是闹心，双路电都能一起断太严重了。

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6