[DESCRIPTION]
kernel发生的异常统称KE(kernel exception),除了由软件触发的oops,panic,还有一类问题是由看门狗触发的异常。该如何区分当前的KE是看门狗触发的异常呢?
[KEYWORD]
WDT
HWT
看门狗
在芯片里,有只有1个硬件的看门狗(WDT,89是例外,每个CPU还有local WDT),该看门狗30s就会超时,先发出FIQ中断(此时重新启动30s/60s倒计时防止卡死),如果CPU响应了,则进入FIQ处理函数,收集完关键信息后,主动调用BUG()进入panic,最后重启。如果发出的FIQ CPU无法响应(比如bus hang等),则倒计时结束后直接由WDT发出芯片复位信号,复位芯片重启。
FIQ处理函数是:
wdt-handler.c (mediatek\kernel\drivers\aee\common)的aee_wdt_fiq_info() -> aee_wdt_irq_info()
整个FIQ处理流程大致是:
panic()
die()
arm_notify_die()
do_undefinstr()
__und_svc_finish()
aee_wdt_irq_info() <= 切换到当前进程的栈,最后调用了BUG()
aee_wdt_fiq_info() <= 单独的栈
wdt_fiq() <= 单独的栈
FIQ中断
注意在FIQ处理函数里,用的是单独的栈,因此不能使用printk函数(printk会使用current宏!)
最后从log识别是否是看门狗复位:
<4>[1122.222262] (0)[52:xxx]------------[ cut here ]------------
<2>[1122.222276] (0)[52:xxx]Kernel BUG at c03603a8 [verbose debug info unavailable]
<0>[1122.222291] (0)[52:xxx]Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
<4>[1122.690642] (0)[52:xxx]Modules linked in:
<4>[1122.690661] (0)[52:xxx]CPU: 0 Tainted: G W (3.4.67 #1)
<4>[1122.690682] (0)[52:xxx]PC is at aee_wdt_irq_info+0x104/0x12c <= 这里如果是aee_wdt_irq_info()则表示是看门狗复位的问题!
<4>[1122.690696] (0)[52:xxx]LR is at aee_wdt_irq_info+0x104/0x12c
......
<4>[1122.693558] (0)[52:xxx]Backtrace: <= 分析时要知道哪个CPU没有喂狗,需要分析当前在线的CPU的调用栈
<4>[1122.693587] (0)[52:xxx][] (aee_wdt_irq_info+0x0/0x12c) from [] (aee_wdt_fiq_info+0xf0/0x100)
<4>[1122.693603] (0)[52:xxx] r6:df8f7fb0 r5:c0b8c0d8 r4:00000000
<4>[1122.693642] (0)[52:xxx][] (xxx_thread+0x0/0x18c) from [] (kthread+0x90/0x9c)
<4>[1122.693670] (0)[52:xxx][] (kthread+0x0/0x9c) from [] (do_exit+0x0/0x758)
<4>[1122.693685] (0)[52:xxx] r6:c004e5dc r5:c0065a10 r4:df829ef4
<0>[1122.693712] (0)[52:xxx]Code: 1afffffb f57ff05f e3a00011 eb018cb0 (e7f001f2)
<4>[1122.693738] (0)[52:xxx]---[ end trace 16e4575465f98330 ]---
<0>[1122.693752] (0)[52:xxx]Kernel panic - not syncing: Fatal exception
<3>[1122.693766] (0)[52:xxx]==========================================
//以下是在线CPU的调用栈,需要一一分析看是否是哪个CPU没有喂狗
<3>[1122.693792] (0)[52:xxx]CPU 0 FIQ: Watchdog time out
<3>[1122.693799] (0)[52:xxx]preempt=0, softirq=0, hardirq=0
<3>[1122.693805] (0)[52:xxx]pc : c03614e8, lr : c06d3aa8, cpsr : 600d0093
<3>[1122.693813] (0)[52:xxx]sp : dfbebf80, ip : dfbebed0, fp : dfbebfb4
<3>[1122.693820] (0)[52:xxx]r10 : c09b5b20, r9 : 00000000, r8 : c0b8d570
<3>[1122.693827] (0)[52:xxx]r7 : c09b5b20, r6 : c098f9c0, r5 : db7ac1e0
<3>[1122.693835] (0)[52:xxx]r4 : 00000000, r3 : 00000000, r2 : dfbebed0
<3>[1122.693842] (0)[52:xxx]r1 : 800d0013, r0 : 00000000
......
<3>[1122.693905] (0)[52:xxx]Backtrace : c06d3aa8, c0065aa0, c004e5dc
<3>[1122.693925] (0)[52:xxx]==========================================
<3>[1122.693950] (0)[52:xxx]CPU1: stopping by FIQ
<3>[1122.693957] (0)[52:xxx]preempt=1, softirq=0, hardirq=0
<3>[1122.693963] (0)[52:xxx]pc : c003e160, lr : c003e158, cpsr : 600f0093
<3>[1122.693971] (0)[52:xxx]sp : df841f58, ip : df841f58, fp : df841f6c
<3>[1122.693978] (0)[52:xxx]r10 : c0990b1c, r9 : 410fc073, r8 : df840000
<3>[1122.693985] (0)[52:xxx]r7 : c06db414, r6 : c06dae88, r5 : 00000001
<3>[1122.693992] (0)[52:xxx]r4 : 00000001, r3 : 00000000, r2 : 018a14d6
<3>[1122.694000] (0)[52:xxx]r1 : 00000003, r0 : 00000001
......
<3>[1122.694063] (0)[52:xxx]Backtrace : c003e158, c003d1cc, c003e12c, c000f3dc, c000f7b8, c06c3374
<3>[1122.694093] (0)[52:xxx]==========================================
<3>[1122.694118] (0)[52:xxx]CPU2: stopping by FIQ
<3>[1122.694124] (0)[52:xxx]preempt=1, softirq=0, hardirq=0
<3>[1122.694131] (0)[52:xxx]pc : c04e1fbc, lr : c04e35d8, cpsr : 60030013
<3>[1122.694138] (0)[52:xxx]sp : dfb9bbc0, ip : dfb9bc60, fp : dfb9bc5c
<3>[1122.694145] (0)[52:xxx]r10 : f1230000, r9 : defe82c0, r8 : dfb9bd70
<3>[1122.694153] (0)[52:xxx]r7 : 00000000, r6 : dfb9bdb8, r5 : 00000000
<3>[1122.694160] (0)[52:xxx]r4 : defe8000, r3 : 00000000, r2 : 00000000
<3>[1122.694167] (0)[52:xxx]r1 : dfb9bd70, r0 : defe8000
......
<3>[1122.694230] (0)[52:xxx]Backtrace : c04e35d8, c04e35d8, c0309850, c03098b0, c0309944, c03160f0, c0317808, c0309aa4, c0316838, c03171c4, c0318190, c0065aa0, c004e5dc
<3>[1122.694284] (0)[52:xxx]==========================================
<3>[1122.694309] (0)[52:xxx]CPU3: stopping by FIQ
<3>[1122.694316] (0)[52:xxx]preempt=1, softirq=0, hardirq=0
<3>[1122.694322] (0)[52:xxx]pc : c003e160, lr : c003e158, cpsr : 60030093
<3>[1122.694330] (0)[52:xxx]sp : df873f58, ip : df873f58, fp : df873f6c
<3>[1122.694337] (0)[52:xxx]r10 : c0990b1c, r9 : 410fc073, r8 : df872000
<3>[1122.694344] (0)[52:xxx]r7 : c06db414, r6 : c06dae88, r5 : 00000003
<3>[1122.694351] (0)[52:xxx]r4 : 00000003, r3 : 00000000, r2 : 018a14d7
<3>[1122.694359] (0)[52:xxx]r1 : 00000003, r0 : 00000003
......
<3>[1122.694422] (0)[52:xxx]Backtrace : c003e158, c003d1cc, c003e12c, c000f3dc, c000f7b8, c06c3374
<3>[1122.694452] (0)[52:xxx]==========================================
<3>[1122.694469] (0)[52:xxx] kick=0x0000000e,check=0x0000000f <= 这个非常重要!!!,check的每1位代表CPU是否在线,0xF表示CPU0,1,2,3在线,而kick的每1位代表是否喂过狗,0xE表示CPU1,2,3喂过狗,而CPU0没有,那么问题就出在CPU0身上,需要分析CPU0的调用栈,看看为什么没有喂狗。
相关的FAQ:
阅读(1560) | 评论(0) | 转发(0) |