Chinaunix首页 | 论坛 | 博客
  • 博客访问: 288234
  • 博文数量: 57
  • 博客积分: 965
  • 博客等级: 准尉
  • 技术积分: 736
  • 用 户 组: 普通用户
  • 注册时间: 2011-08-24 10:22
文章分类

全部博文(57)

文章存档

2014年(2)

2013年(22)

2012年(25)

2011年(8)

分类: 系统运维

2013-09-04 14:30:56


这几天一台装了centos 6.3 64bit的服务器有点异常,负载不定时升高到八九十,实时查看发现cpu利用、io和内存都很正常,但内核利用cpu占的有点高,有时sy达到88%。

1.png


查看/var/message 发现有如下信息:

点击(此处)折叠或打开

  1. Sep 4 10:18:40 shelly kernel: swapper: page allocation failure. order:1, mode:0x20
  2. Sep 4 10:18:40 shelly kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.el6.x86_64 #1
  3. Sep 4 10:18:40 shelly kernel: Call Trace:
  4. Sep 4 10:18:40 shelly kernel: <IRQ> [<ffffffff8112759f>] ? __alloc_pages_nodemask+0x77f/0x940
  5. Sep 4 10:18:40 shelly kernel: [<ffffffff81161d62>] ? kmem_getpages+0x62/0x170
  6. Sep 4 10:18:40 shelly kernel: [<ffffffff8116297a>] ? fallback_alloc+0x1ba/0x270
  7. Sep 4 10:18:40 shelly kernel: [<ffffffff811623cf>] ? cache_grow+0x2cf/0x320
  8. Sep 4 10:18:40 shelly kernel: [<ffffffff811626f9>] ? ____cache_alloc_node+0x99/0x160
  9. Sep 4 10:18:40 shelly kernel: [<ffffffff811634db>] ? kmem_cache_alloc+0x11b/0x190
  10. Sep 4 10:18:40 shelly kernel: [<ffffffff8142dc68>] ? sk_prot_alloc+0x48/0x1c0
  11. Sep 4 10:18:40 shelly kernel: [<ffffffff8142df32>] ? sk_clone+0x22/0x2e0
  12. Sep 4 10:18:40 shelly kernel: [<ffffffff8147bb86>] ? inet_csk_clone+0x16/0xd0
  13. Sep 4 10:18:40 shelly kernel: [<ffffffff81494ae3>] ? tcp_create_openreq_child+0x23/0x450
  14. Sep 4 10:18:40 shelly kernel: [<ffffffff8149239d>] ? tcp_v4_syn_recv_sock+0x4d/0x310
  15. Sep 4 10:18:40 shelly kernel: [<ffffffff81494886>] ? tcp_check_req+0x226/0x460
  16. Sep 4 10:18:40 shelly kernel: [<ffffffff8148a296>] ? tcp_rcv_state_process+0x126/0xa10
  17. Sep 4 10:18:40 shelly kernel: [<ffffffff81491dbb>] ? tcp_v4_do_rcv+0x35b/0x430
  18. Sep 4 10:18:40 shelly kernel: [<ffffffff814935be>] ? tcp_v4_rcv+0x4fe/0x8d0
  19. Sep 4 10:18:40 shelly kernel: [<ffffffff8108cdf2>] ? queue_work_on+0x42/0x60
  20. Sep 4 10:18:40 shelly kernel: [<ffffffffa0000b7b>] ? queue_io+0x6b/0x90 [dm_mod]
  21. Sep 4 10:18:40 shelly kernel: [<ffffffff814712dd>] ? ip_local_deliver_finish+0xdd/0x2d0
  22. Sep 4 10:18:40 shelly kernel: [<ffffffff81471568>] ? ip_local_deliver+0x98/0xa0
  23. Sep 4 10:18:40 shelly kernel: [<ffffffff81470a2d>] ? ip_rcv_finish+0x12d/0x440
  24. Sep 4 10:18:40 shelly kernel: [<ffffffff81470fb5>] ? ip_rcv+0x275/0x350
  25. Sep 4 10:18:40 shelly kernel: [<ffffffff8143014d>] ? __alloc_skb+0x6d/0x190
  26. Sep 4 10:18:40 shelly kernel: [<ffffffff8143a7bb>] ? __netif_receive_skb+0x49b/0x6f0
  27. Sep 4 10:18:40 shelly kernel: [<ffffffff8143ca38>] ? netif_receive_skb+0x58/0x60
  28. Sep 4 10:18:40 shelly kernel: [<ffffffff8143cb40>] ? napi_skb_finish+0x50/0x70
  29. Sep 4 10:18:40 shelly kernel: [<ffffffff8143f079>] ? napi_gro_receive+0x39/0x50
  30. Sep 4 10:18:40 shelly kernel: [<ffffffffa012b1b4>] ? tg3_poll_work+0x654/0xe30 [tg3]
  31. Sep 4 10:18:40 shelly kernel: [<ffffffffa012b9dc>] ? tg3_poll_msix+0x4c/0x150 [tg3]
  32. Sep 4 10:18:40 shelly kernel: [<ffffffff81057fac>] ? scheduler_tick+0xcc/0x260
  33. Sep 4 10:18:40 shelly kernel: [<ffffffff8143f193>] ? net_rx_action+0x103/0x2f0


google了一下,发现是内核的bug,建议升级到已经修复了此bug的kernel-2.6.32-358.el6 equivalent for cenos,这台机器是线上的mysql master,在线升级内核还有挺有风险的,切换也不是一下子可以安排的,就用如下方法临时解决这先:

sysctl -w vm.zone_reclaim_mode=1

关于zone_reclaim_mode的定义kernel的文档里描述如下

Zone_reclaim_mode allows someone to set more or less aggressive approaches to 
reclaim memory when a zone runs out of memory. If it is set to zero then no 
zone reclaim occurs. Allocations will be satisfied from other zones / nodes 
in the system. 
 
This is value ORed together of 
 
1 = Zone reclaim on 
2 = Zone reclaim writes dirty pages out 
4 = Zone reclaim swaps pages 
 
zone_reclaim_mode is set during bootup to 1 if it is determined that pages 
from remote zones will cause a measurable performance reduction. The 
page allocator will then reclaim easily reusable pages (those page 
cache pages that are currently not used) before allocating off node pages. 
 
It may be beneficial to switch off zone reclaim if the system is 
used for a file server and all of memory should be used for caching files 
from disk. In that case the caching effect is more important than 
data locality. 
 
Allowing zone reclaim to write out pages stops processes that are 
writing large amounts of data from dirtying pages on other nodes. Zone 
reclaim will write out dirty pages if a zone fills up and so effectively 
throttle the process. This may decrease the performance of a single process

since it cannot use all of system memory to buffer the outgoing writes 
anymore but it preserve the memory on other nodes so that the performance 
of other processes running on other nodes will not be affected. 
 
Allowing regular swap effectively restricts allocations to the local 
node unless explicitly overridden by memory policies or cpuset 
configurations.

 

总结一下,就是这个参数告诉内核当内存不够用时就直接回收buffer/cache





阅读(3441) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~