Chinaunix首页 | 论坛 | 博客
  • 博客访问: 663465
  • 博文数量: 198
  • 博客积分: 4256
  • 博客等级: 上校
  • 技术积分: 1725
  • 用 户 组: 普通用户
  • 注册时间: 2009-12-15 13:12
文章分类

全部博文(198)

文章存档

2012年(12)

2011年(39)

2010年(135)

2009年(12)

我的朋友

分类: LINUX

2010-10-28 15:58:57

今天启动复制的时候,发现SQL IO的进程没有启动起来
后面是整个的处理过程。

一. 发现

1.在主库执行insert语句, 发现从库上没有相应的执行。
2.使用show processlist,没有发现SQL IO的进程
mysql> show processlist;
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
| Id | User        | Host      | db   | Command | Time | State                                                                       | Info             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
|  3 | root        | localhost | an   | Query   |    0 | NULL                                                                        | show processlist |
12 | system user |           | NULL | Connect |  117 | Waiting for master to send event                                            | NULL             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
3.查看err日志
101008 10:04:03 [Note] Slave SQL thread initialized, starting replication in log ‘bin-log.000002′ at position 198, relay log ‘/var/local/mysql/logdir/relay_log.000004′ position: 342
101008 10:04:28 [ERROR] Slave SQL: Error ‘Table ‘an_tb’ already exists’ on query. Default database: ‘an’. Query: ‘create table an_tb (id int) type=innodb’, Error_code: 1050
101008 10:04:28 [Warning] Slave: The syntax ‘TYPE=storage_engine’ is deprecated and will be removed in MySQL 6.0. Please use ‘ENGINE=storage_engine’ instead Error_code: 1287
101008 10:04:28 [Warning] Slave: Table ‘an_tb’ already exists Error_code: 1050
101008 10:04:28 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘bin-log.000002′ position 98

4.根据陶方的提醒,使用show slave status\G
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000004
Relay_Log_Pos: 242
Relay_Master_Log_File: bin-log.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1050
Last_Error: Error ‘Table ‘an_tb’ already exists’ on query. Default database: ‘an’. Query: ‘create table an_tb (id int) type=innodb’
Skip_Counter: 0
Exec_Master_Log_Pos: 98
Relay_Log_Space: 14332
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1050
Last_SQL_Error: Error ‘Table ‘an_tb’ already exists’ on query. Default database: ‘an’. Query: ‘create table an_tb (id int) type=innodb’
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)

总结原因: slave服务器在执行relay log的时候,有个sql执行不下去了,卡住了。
该sql为:create table an_tb (id int) type=innodb’

二. 解决方法:

第一种:基本思路是drop掉an_tb表, 结果:不成功!
过程是:
在从库上:
drop table an_tb;
stop slave;
start slave;
结果是:报同样的错误。 因为每次start slave之后,都会从relay log中读出create table an_tb的SQL;

第二种:基本思路是跳过relay log, 结果:不成功!
简单的过程:
MYSQL> stop slave;
MYSQL>change master to
master_log_file=’bin-log.000006′,
master_log_pos=612,
RELAY_LOG_FILE=’relay_log.000023′,
RELAY_LOG_POS=612;
MYSQL> start slave;
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000017
Relay_Log_Pos: 612
Relay_Master_Log_File: bin-log.000006
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 612
Relay_Log_Space: 15448
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)

此时在show slave status中看不到错误了, 但是在err文件中,还是有同样的错误的
101008 10:40:37 [Note] Slave SQL thread initialized, starting replication in log ‘bin-log.000002′ at position 607, relay log ‘/var/local/mysql/logdir/relay_log.000012′ position: 751
101008 10:40:37 [ERROR] Slave SQL: Error ‘Table ‘an_idb1′ already exists’ on query. Default database: ‘an’. Query: ‘create table an_idb1 (id int) type=innodb’, Error_code: 1050
101008 10:40:37 [Warning] Slave: The syntax ‘TYPE=storage_engine’ is deprecated and will be removed in MySQL 6.0. Please use ‘ENGINE=storage_engine’ instead Error_code: 1287
101008 10:40:37 [Warning] Slave: Table ‘an_idb1′ already exists Error_code: 1050
101008 10:40:37 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘bin-log.000002′ position 607
101008 10:40:37 [Note] Slave I/O thread: connected to master ‘anmh@dbadb1.corp.alimama.com:3307′,replication started in log ‘bin-log.000006′ at position 612
但是发现现在的表变成了an_idb1了, 不再是an_tb。  看来这个方法是不对的。

第三种方法:思路还是要跳过relay log。采用reset slave的方法。结果是:不成功!
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.12 sec)

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File:
Read_Master_Log_Pos: 4
Relay_Log_File: relay_log.000001
Relay_Log_Pos: 4
Relay_Master_Log_File:
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 0
Relay_Log_Space: 126
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)

可以看到reset slave是把相关值做默认设置
然后使用
mysql> start slave;
Query OK, 0 rows affected (0.09 sec)

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000004
Relay_Log_Pos: 751
Relay_Master_Log_File: bin-log.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1050
Last_Error: Error ‘Table ‘an_idb1′ already exists’ on query. Default database: ‘an’. Query: ‘create table an_idb1 (id int) type=innodb’
Skip_Counter: 0
Exec_Master_Log_Pos: 607
Relay_Log_Space: 14332
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1050
Last_SQL_Error: Error ‘Table ‘an_idb1′ already exists’ on query. Default database: ‘an’. Query: ‘create table an_idb1 (id int) type=innodb’
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
发现还是没有启动slave io进程, 查看err日志
101008 10:53:52 [Note] Slave I/O thread killed while reading event
101008 10:53:52 [Note] Slave I/O thread exiting, read up to log ‘bin-log.000006′, position 612
101008 10:58:15 [Note] Slave SQL thread initialized, starting replication in log ‘bin-log.000006′ at position 612, relay log ‘/var/local/mysql/logdir/relay_log.000017′ position: 612
101008 10:58:15 [ERROR] Error in Log_event::read_log_event(): ‘read error’, data_len: 258, event_type: 0
101008 10:58:15 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
101008 10:58:15 [ERROR] Slave SQL: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave. Error_code: 1594
101008 10:58:15 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with “SLAVE START”. We stopped at log ‘bin-log.000006′ position 612
101008 10:58:15 [Note] Slave I/O thread: connected to master ‘anmh@dbadb1.corp.alimama.com:3307′,replication started in log ‘bin-log.000006′ at position 612
报新的错误了,binlog都挂了。呵呵,这说明reset slave之后, slave的一切设置都被清空了。
这也让我想起了,可以继续使用change master to来做设置。

第四种,终于成功了的方法。 基本思路还是跳过relay log
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)

mysql> reset slave;
Query OK, 0 rows affected (0.13 sec)

mysql> change master to
-> master_log_file=’bin-log.000006′,
-> master_log_pos=612;
Query OK, 0 rows affected (0.18 sec)
在这里的master log和master log pos一定要是最新的,或者是比较新的。(总之要跳过报错的sql)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: dbadb1.corp.alimama.com
Master_User: anmh
Master_Port: 3307
Connect_Retry: 15
Master_Log_File: bin-log.000006
Read_Master_Log_Pos: 612
Relay_Log_File: relay_log.000002
Relay_Log_Pos: 242
Relay_Master_Log_File: bin-log.000006
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 612
Relay_Log_Space: 392
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)

mysql> show processlist;
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
| Id | User        | Host      | db   | Command | Time | State                                                                       | Info             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
|  3 | root        | localhost | an   | Query   |    0 | NULL                                                                        | show processlist |
| 12 | system user |           | NULL | Connect |  117 | Waiting for master to send event                                            | NULL             |
| 13 | system user |           | NULL | Connect |   51 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL             |
+—-+————-+———–+——+———+——+—————————————————————————–+——————+
3 rows in set (0.00 sec)

终于不报错啦, 而且SQL IO进程也启动起来了。 一切ok。

阅读(2186) | 评论(0) | 转发(1) |
给主人留下些什么吧!~~