Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2107857
  • 博文数量: 333
  • 博客积分: 10161
  • 博客等级: 上将
  • 技术积分: 5238
  • 用 户 组: 普通用户
  • 注册时间: 2008-02-19 08:59
文章分类

全部博文(333)

文章存档

2017年(10)

2014年(2)

2013年(57)

2012年(64)

2011年(76)

2010年(84)

2009年(3)

2008年(37)

分类: LINUX

2011-07-06 10:37:01

Machine Check Exception (MCE) is a type of computer hardware error that occurs when a computer's central processing unit detects a hardware problem.

Microsoft Windows displays the error using the blue screen of death containing the error message (the parameters inside the brackets vary):

STOP: 0x0000009C (0x00000004, 0x00000000, 0xB2000000, 0x00020151) "MACHINE_CHECK_EXCEPTION"

On Linux, a process (such as klogd[1] ) writes a message to the kernel log and/or the console screen (usually only to the console when the error is non-recoverable and the machine crashes as a result):

CPU 0: Machine Check Exception: 0000000000000004 Bank 2: f200200000000863 Kernel panic: CPU context corrupt

The error usually occurs due to failure or overstressing of hardware components where the error cannot be more specifically identified with a different error message. Diagnosing the error message can be difficult, although Intel Pentium processors do generate more specific codes which can be decoded by contacting the manufacturer.

MCEs require a restart of the system before users can continue normal operation: they often indicate a long-term problem of a general nature.

Contents [hide]
[edit]Problem types

Most of these errors relate specifically to the Pentium processor family. Similar errors may occur on other processors and will cause similar problems.

Some of the main hardware problems that cause MCEs include:

  • System bus errors (error communicating between the processor and the motherboard).
  • Memory errors that may include parity / Error correction code (ECC) problems. Error checking ensures that data is stored correctly in the RAM; if information is corrupted, then random errors occur.
  • Cache errors in the processor; the cache stores important data and code. If this is corrupted, errors often occur.
[edit]Causes

Normal causes for MCE errors include overheating and/or incorrect hardware installation. Some specific manually-induced causes could include:

  • overclocking (which normally increases heat-output)
  • poorly fitted heatsink/computer fans (the same problem can happen with excessive dust in the CPU fan)
  • an overloaded internal or external power supply (fixable by upgrading)

Computer software can also cause MCE errors (normally by corrupting data which programs read or write). For example, software performing read or write operations from or to non-existent memory regions can lead to confusion for the processor and/or the system bus.

[edit]Decoding MCEs

As noted previously, decoding MCE errors can prove difficult. Normally the manufacturer (especially processor manufacturers) will be able to provide information about specific codes. Consult the Intel 64 and IA-32 Architectures Software Developer's Manual[2] Chapter 15 (Machine-Check Architecture), or the Microsoft KB Article on Windows Exceptions[3].

[edit]Programs to Decode MCEsmcatA Windows command-line program from AMD to decode MCEs from AMD K8, Family 0x10 and 0x11 processorsmcelogLinux daemon by Andi Kleen to handle MCEs for modern x86 processors. mcelog can also decode machine checks.parsemceLinux program by Dave Jones to decode MCEs from AMD K7 processorsmcedLinux program by Tim Hockin to gather MCEs from the kernel and alert interested applications. The primary difference between this app and others is that this is a daemon (it is always running) which means that it can get MCE notifications as soon as the kernel finds them. It does not try to interpret the MCE data, just alert other apps.[edit]See also[edit]References
  1. ^ "KLOGD(8)"UNIX man pages. 1999-08-21. Retrieved 2008-07-29. "klogd is a system daemon which intercepts and logs Linux kernel messages."
  2. ^ "Intel 64 and IA-32 Architectures Software Developer's Manual".
  3. ^ "Microsoft KB 329284 - Stop error "0x0000009C (0x00000004, 0x00000000, 0xb2000000, 0x00020151)"".
[edit]External links
阅读(5894) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~