场景:领导电话通知,我们的主站宕机了,到家后
从另外一台机器上ssh一直处于等待状态,开始怀疑机器的负载比较高,
后查看监控机器,发现网卡、cpu、nginx连接数.....通通都没有数据了,显然不是负载高度问题了,应该是机器死机了,立刻通过ipmi重启机器
重启机器后,机器正常!
其实这个机器都正常运行大半年了,没啥问题!
查询/var/log/messages,发现大量的信息如下:
-
Mar 12 11:15:04 hy1 kernel: php-fpm: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:04 hy1 kernel: php-fpm: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:04 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:06 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:09 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:09 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:09 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:10 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:33 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
-
Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
开始怀疑是系统的内存被吃光了,但通过检查监控,发现出问题的时候,内存还有蛮多可以用的! 当时的内存使用情况,见附件!
后来查到是内核的的一个bug
解决方法如下:
vi /etc/sysctl.conf
写入:
vm.zone_reclaim_mode = 1
sysctl -p 使其理解生效
阅读(3924) | 评论(0) | 转发(0) |