Teamtrack ID (Bug/RFE/Task):
BUG86127: klogd stuck consuming CPU
Root Cause (Bug) or Purpose (RFE/Task):
two points:
(1). Currently, the klogd can’t process the full buffer that doesn’t include the end character ‘\n’. In this case, it will lead the death cycle of klogd. It means that the klogd sticks up CPU.
(2). What logs are so large and not include the end character in the firewall sessions of kernel module.
For #1, you can obtain the more answers from the CL480815.
For #2, there is NO rate limit about plenty of repeat kernel logs. Sometimes some parts of the printk will produce a large number of output, resulting in the kernel ring buffer overflow and old information is quickly covered. Maybe it leads in that the kernel ring buffer does not include ‘\n’. At the same time, the klogd gets all the kernel logs by invoking syslog and will not parse the logs. It leads to reproduce #1.
The answer is not that a complete log, but many logs are covered in each other. Eventually forming a huge log which doesn't include the end character.
Solution:
For plenty of repeat kernel logs, there will be rate limit in the firewall sessions of kernel module. Be similar to the function "net_ratelimit" in the network module, we use the function "printk_ratelimit" instead and control the rate by operating the value of the files "/proc/sys/kernel/printk_ratelimit" and "/proc/sys/kernel/printk_ratelimit_burst".
Note: This solution will cause some sessions delete/idle time logs lost when a lot of sessions are online at the same time.
Unit test:
(1). Replace the xt_session.ko in XTM2520 device.
(2). Continually create plenty of 4100 authentication sessions and keep 30W sessions online at the same time.
(3). Check sessions and the rate limit of printk. The following is a snapshot.
wgsupport@Member2:/proc/sys/kernel# cat printk_ratelimit
5
wgsupport@Member2:/proc/sys/kernel# cat printk_ratelimit_burst
10
wgsupport@Member2:/proc/sys/kernel# wgcmd status /authentication/list type=5 | grep user_id | wc
358252 358252 11491124
wgsupport@Member2:/proc/sys/kernel# top
top - 15:04:23 up 18:50, 0 users, load average: 8.63, 9.05, 9.17
Tasks: 185 total, 12 running, 173 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.2%us, 44.6%sy, 0.0%ni, 32.5%id, 0.1%wa, 0.0%hi, 1.6%si, 0.0%st
Mem: 32828280k total, 7607828k used, 25220452k free, 31640k buffers
Swap: 16383996k total, 0k used, 16383996k free, 285608k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3117 nobody 5 -15 5852m 5.6g 2760 R 78 18.0 1085:01 sessiond
3129 wgsuppor 5 -15 101m 4916 4016 R 75 0.0 646:36.27 l2tp-wrap
3124 wgsuppor 5 -15 83952 4128 3356 R 71 0.0 653:37.15 fwagent
3111 wgsuppor 5 -15 89348 4092 3304 R 70 0.0 634:09.75 sslvpn-client-w
3128 wgsuppor 5 -15 84316 4104 3352 R 68 0.0 650:07.93 snmp-wrap
2948 wgsuppor 5 -15 73524 3536 2856 R 68 0.0 640:03.23 pimd-wrap
2759 wgsuppor 5 -15 93952 8072 4116 R 63 0.0 719:11.23 loggerd
4 wgsuppor 20 0 0 0 0 R 57 0.0 182:37.68 kworker/0:0
10 wgsuppor 20 0 0 0 0 R 52 0.0 230:30.60 rcuc/0
15 wgsuppor 20 0 0 0 0 S 3 0.0 4:47.71 ksoftirqd/1
(4) Check the kernel logs by the command "tail -f /proc/kmsg" and all logs are normal. The following is a snapshot.
<7>[48323.430678] xt_session: session_entry_expiry: Queuing multicast for idle timeout of "session" list entry for 10.2.58.5
<7>[48323.430681] xt_session: session_entry_expiry: Queuing multicast for idle timeout of "session" list entry for 10.2.58.6
<7>[48323.430684] xt_session: session_entry_expiry: Queuing multicast for idle timeout of "session" list entry for 10.2.58.7
<7>[48323.430686] xt_session: session_entry_expiry: Queuing multicast for idle timeout of "session" list entry for 10.2.58.18
(5) Modify the rate limit to increase it and check the kernel logs that is abnormal. The following is a snapshot.
wgsupport@Member2:/proc/sys/kernel# echo 1 > printk_ratelimit
wgsupport@Member2:/proc/sys/kernel# echo 999 > printk_ratelimit_burst
wgsupport@Member2:/proc/sys/kernel#
wgsupport@Member2:/proc/sys/kernel# cat printk_ratelimit
1
wgsupport@Member2:/proc/sys/kernel# cat printk_ratelimit_burst
999
wgsupport@Member2:/proc/sys/kernel# top
top - 15:13:22 up 18:59, 0 users, load average: 9.26, 9.57, 9.45
Tasks: 185 total, 10 running, 175 sleeping, 0 stopped, 0 zombie
Cpu(s): 23.9%us, 46.6%sy, 0.0%ni, 28.3%id, 0.0%wa, 0.0%hi, 1.2%si, 0.0%st
Mem: 32828280k total, 7683988k used, 25144292k free, 31668k buffers
Swap: 16383996k total, 0k used, 16383996k free, 285776k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3117 nobody 5 -15 5874m 5.7g 2760 S 94 18.1 1093:46 sessiond
3124 wgsuppor 5 -15 83952 4128 3356 R 73 0.0 659:47.52 fwagent
3128 wgsuppor 5 -15 84316 4104 3352 R 73 0.0 656:17.56 snmp-wrap
3111 wgsuppor 5 -15 89348 4092 3304 R 72 0.0 640:08.83 sslvpn-client-w
3129 wgsuppor 5 -15 101m 4916 4016 R 71 0.0 652:41.89 l2tp-wrap
2948 wgsuppor 5 -15 73524 3536 2856 R 69 0.0 646:06.99 pimd-wrap
2759 wgsuppor 5 -15 93952 8072 4116 R 66 0.0 725:59.86 loggerd
4 wgsuppor 20 0 0 0 0 R 56 0.0 184:29.19 kworker/0:0
3 wgsuppor 20 0 0 0 0 R 56 0.0 189:21.73 ksoftirqd/0
26 wgsuppor RT 0 0 0 0 S 6 0.0 1:28.74 migration/3
14 wgsuppor RT 0 0 0 0 S 3 0.0 1:36.88 migration/1
wgsupport@Member2:/proc/sys/kernel# tail -f /proc/kmsg
...
<7>[ 1102.564786] xt_session: Session for 10.2.47.249 with ID 253911 aged out from session list
<7>[ 1102.564787] xt_session: session_entry_expiry: Queuing multicast for idle timeout of "session" list entry for 10.2.47.249
<7>[ 1102.564789] xt_session: Sestry for 10.2.565298]try for 10.2.4.565302] ry for 10.2.5305for 10.2.48.152
<7>[8] xt_sr 10.2.48.153] xt_s10.2.48.154
<t_sessi2.48.155
<7>ession8.156
<7>[ 11ion: se157
<7>[ 1102.565323] on: ses8
<7>[ 1102.565 session7>[ 1102.5653ssion_[ 1102.565333]sion_en1102.565336] xtn_entr2.565340] xt_session: Sessiontry_565343] xt_session:try_ex5347] xt_session:_expi0] xt_session: piry:xt_session: Sey: Que_session: Session: Queuinsession: Sess Queuission: Sessieuiion: Sessionuing n: Session fog multSession for 1ulticsion for 10.icast fn for 10.2.48.st for ior 10.2.48.176 for id10.2.48.177 wit idle 2.48.178 with Ile time8.179 with IDtimeout 80 with ID 254095 aged out freout of "sesswith ID 254096 aged ot of "sessh ID 254097 aged outf "session" lisID 254098 aged out from "sessio 254099 aged ession"54100 aged ossion"101 aged out froon" li aged out fro list enged out from selist end out from sesst entryout from session list
<7>[ 1102.565431] xt_ entry ft from session try ffrom session liry foom session listfor 1session list
< 10.2.48ssion list
<7>10.2.48.sion list
<7>0.2.48on list
<7>[ 11.48.1list
<7>[ 1102..198
<7t
<7>[ 1102.59
<7>[7>[ 1102.565467>[ 1[ 1102.565468>[ 11 1102.565471][ 1102102.565474] x102.52.565477] xt_se02.5652.565481] xt_102.52.565484] xt02.5654.565487] xt_session: session_entry_expiry: Queuing multicast for idle 2.56548565491] xt_sessi.565495494] xt_sessio5496]7] xt_session:9] xt_xt_session: xt_sessisession: sesssession: Session for sion: session_esion: n: session_enn: Sess: session_enton: Sess: session_entry_expiryn: Sess session_entry_expiry: Queu: Sessi session_entry_expiry: Qn: Sessisession_entry_exSessiosion_entry_exsion fon_entry_expion f_entry_expiry for 1ntry_expiry: for 1ry_expiry: Q 10.2.48expiry: Queu.2.48.2iry: Queuing mu48.22y: Queuing mu8.228 Queuing multi29 wieuing multicast fwith ing multicasith ID g multicast for ID 2ulticast for 2541icast for id4149 st for idle timeout of "session" list entry for 10.2.48.234