星期一上班,就听到 开发说一台mysql数据库down掉(此台数据库只做备份用)。连上系统,用ps -ef |grep mysql查看下进程,果真以个进程都没有。那就重启mysql服务,启动不起来。就查看mysql的错误日志。
InnoDB: Doing recovery: scanned up to log sequence number 0 48155
InnoDB: Last MySQL binlog file position 0 79, file name /var/log/mysql/updatelog.000006
100621 5:36:33 InnoDB: Flushing modified pages from the buffer pool...
100621 5:36:33 InnoDB: Started; log sequence number 0 48155
100621 5:36:34 [ERROR] /mysql/libexec/mysqld: Error writing file '/data/db2.ihome.com.pid' (Errcode: 28)
100621 5:36:34 [ERROR] Can't start server: can't create PID file: No space left on device
Number of processes running now: 0
100621 05:36:34 mysqld restarted
/mysql/libexec/mysqld: Error writing file './db2-slow.log' (Errcode: 28)
100621 5:36:34 [ERROR] Could not use db2-slow.log for logging (error 28). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
100621 5:36:34 InnoDB: Database was not shut down normally!
哦原来磁盘空间满了,使用df -h 查看下,存放data的目录已经使用100%.进入data目录查看。原来是一个备份数据库文件在里面占用的了大量的磁盘空间。查看备份脚本原来是备份脚本里面的路径写错了。修改路径,把备份文件移动到备份目录。启动mysql服务正常启动。
使用命令登陆上mysql,发现主从已经不同步了。查看mysql的错误日志得到
100621 9:32:21 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000023' at position 211845, relay log './db2-relay-bin.000013' position: 217010
100621 9:32:21 [ERROR] Error in Log_event::read_log_event(): 'read error', data_len: 112, event_type: 2
100621 9:32:21 [ERROR] Error reading relay log event: slave SQL thread aborted because of I/O error
100621 9:32:21 [ERROR] Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
100621 9:32:21 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000023' position 211845
首先想到尝试着手动来解决主从同步。连接上数据库
[root@db2 data]# mysql -u root -p
Enter password: 输入密码
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 55 to server version: 4.1.12-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql>stop slave ; #停掉slave 进程,手动 解决mysql主从时,需要先结束掉mysql slave。
mysql> change master to
mysql->MASTER_LOG_POS=211845
mysql ->MASTER_LOG_FILE = 'mysql-bin.000023';
mysql>start slave;
mysql>show slave status\G;
查看同步依然是不成功。继续查看mysql 错误日志
100621 9:41:35 [Note] Slave SQL thread initialized, starting replication in log 'binglog mysql-bin.000023' at position 211845, relay log './db2-relay-bin.000001' position: 4
100621 9:41:35 [Note] Slave I/O thread: connected to master , replication started in log 'binglog mysql-bin.000023' at position 211845
100621 9:41:35 [ERROR] Error reading packet from server: Could not find first log file name in binary log index file (server_errno=1236)
100621 9:41:35 [ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log
100621 9:41:35 [ERROR] Slave I/O thread exiting, read up to log 'binglog mysql-bin.000023', position 211845
删除掉data目录下文件
db2-relay-bin.000001
db2-relay-bin.index
master.info
relay-log.info # 这些文件里面存放着同步时的信息。
然后重启下mysql 服务,查看同步信息。问题依旧。然后 查看上次错误日志 与此次的错误日志。找不到 log文件,连接上主的mysql数据库。查看mysql信息一切正常。
#mysqlbinlog --start-position=211845 mysql-bin.000023 > a.sql
把从 211845到现在不同步的二进制文件导入到一个文本文件里面。然后使用scp 传到从数据库上
scp a.sql 输入密码自动copy到mysql 从服务器上的根目录下。
打开a.sql 查看最后一行的
然后到从数据库上 停止mysql slave 服务。stop slave
使用#mysql -uuser -p passwd database < /a.sql #手动导入没有同步的数据
导入完成之后 删除掉 db2-relay-bin.000001 db2-relay-bin.index master.info relay-log.info 这些文件。 使用
mysql>stop slave ;
mysql>change master to MASTER_LOG_POS = 155383248 MASTER_LOG_FILE = 'mysql-bin.000023';
Query OK, 0 rows affected (0.01 sec)
然后重启mysql服务,service mysqld restart
启动slave
mysql>start slave;
查看mysql主从状态mysql>show slave status\G;
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
看到这两行状态为Yes我心甚慰。看到这两行为Yes表示已经开始同步数据了。
下面是CHANGE MASTER TO语法
CHANGE MASTER TO master_def[,master_def] ...
master_def:
MASTER_HOST = 'host_name'
| MASTER_USER = 'user_name'
| MASTER_PASSWORD = 'password'
| MASTER_PORT =port_num
| MASTER_CONNECT_RETRY =count
| MASTER_LOG_FILE = 'master_log_name'
| MASTER_LOG_POS =master_log_pos
| RELAY_LOG_FILE = 'relay_log_name'
| RELAY_LOG_POS =relay_log_pos
| MASTER_SSL = {0|1}
| MASTER_SSL_CA = 'ca_file_name'
| MASTER_SSL_CAPATH = 'ca_directory_name'
| MASTER_SSL_CERT = 'cert_file_name'
| MASTER_SSL_KEY = 'key_file_name'
| MASTER_SSL_CIPHER = 'cipher_list'
可以更改从属服务器用于与主服务器进行连接和通讯的参数。
MASTER_USER,MASTER_PASSWORD,MASTER_SSL,MASTER_SSL_CA,MASTER_SSL_CAPATH,MASTER_SSL_CERT,MASTER_SSL_KEY和MASTER_SSL_CIPHER用于向从属服务器提供有关如何与主服务器连接的信息
检查从服务器一般使用show slave status命令来检查
mysql > SHOW SLAVE STATUS\G
*************************** 1 . row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168 . 0.100
Master_User: root
Master_Port: 3306
Connect_Retry: 3
Master_Log_File: mysql - bin. 003
Read_Master_Log_Pos: 79
Relay_Log_File: mysql - relay - bin. 003
Relay_Log_Pos: 548
Relay_Master_Log_File: mysql - bin. 003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Last_Errno: 0
…..
在上面这些信息中我们主要关注的是Slave_IO_Running和Slave_SQL_Running
Slave_IO_Running:从服务器正从主服务器上读取BINLOG日志,并写入从服务器的中继日志
Slave_SQL_Running:进程正在读取从服务器的BINLOG中继日志,并转化为SQL执行
以前有一个进程是no状态,表示复制的进程停止,在Last_Errno会看到是什么情况
有时候因为主服务器的更新过于频繁,造成了从服务器更新速度较慢,当然问题是多种多样,有可能是网络搭建的结构不好或者硬件的性能较差,从而使得主从服务器之间的差距越来越大,最终对某些应用产生了影响,在这种情况下,我们需要定期进行主从服务器的数据同步,具体步骤如下
在主服务器上
mysql > FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected ( 0.03 sec)
mysql > show master status\G;
*************************** 1 . row ***************************
File : mysql - bin. 000004
Position: 102
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set ( 0.00 sec)
记录出日志的名字和偏移量,这些是从服务器复制的目的目标
在从服务器上,使用MASTER_POS_WAIT()函数得到复制坐标值
mysql > select master_pos_wait( ' mysql-bin.000004 ' , ' 102 ' );
+ -- -----------------------------------------+
| master_pos_wait( ' mysql-bin.000004 ' , ' 102 ' ) |
+ -- -----------------------------------------+
| 0 |
+ -- -----------------------------------------+
1 row in set ( 0.00 sec)
这个select 语句会阻塞直到从服务器达到指定日志文件和偏移量后,返回0,如果是-1,则表示超时推出,查询是0时,表示从服务器与主服务器已经同步
在某些情况下,会出现从服务器更新失败,首先需要确定是否从服务器的表与主服务器的不同造成的,如果是表结构造成的,则需要修改从服务器的表和主服务器一致,然后重新运行start slave
如果不是表结构不同造成的更新失败,则需要确认手动更新是否安全,然后忽视来自主服务器的更新失败语句,跳过来来自主服务器的语句,命令为SET GLOBAL SQL_SLAVE_SKIP_COUNTER=n,其中,n=1表示来自主服务器的更新语句不使用AUTO_INCREMENT或LAST_INSERT_ID(),n=2时则反之,原因是使用AUTO_INCREMENT或LAST_INSERT_ID的语句需要从二进制日志中取得两个事件.