ST350-2 Diagnostic Tools-penguinstorm-ChinaUnix博客

好好学习,天天向SUNpenguinstorm.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

penguinstorm

博客访问： 5783241
博文数量： 745
博客积分： 10075
博客等级：上将
技术积分： 7716
用户组：普通用户
注册时间： 2005-04-29 12:09

文章分类

全部博文（745）

Vmware（2）
Tuxedo（4）
Solaris（162）

ES310（9）

ES255（10）

SM240（10）

ES222（29）

读书笔记（0）

实践操作（18）

Solaris高级系统（10）

ST350（10）

SA399（10）

SA299（27）

SA239（29）
English（8）
Informix（8）
weblogic（6）
软件工程（7）
双机专题（47）

Solaris_cluster（20）

HPUX_MC/ServiceG（4）

AIX_HACMP（23）
考试认证（112）

荣誉勋章（15）

经验总结（15）

CCIE（9）

CCNP（25）

CCNA（26）

CISCO认证（0）

ORACLE认证（6）

SUN认证（2）

HP认证（10）

IBM认证（4）
闲言碎语（13）
好文收录（10）
人在职场（5）
热点关注（3）
系统管理（30）

文档备份（22）
HPUX（106）

11.31专题（9）

学逻辑卷（20）

存储备份（2）

动手实践（67）

基础知识（8）
Linux（22）
Oracle（86）

DataGuard（12）

数据保护（2）

streams（11）

RAC（14）

故障诊断（13）

安装迁移（8）

升级调优（11）

备份恢复（15）
CISCO（16）

Dynamips（13）

PacketTracer（2）

路由相关（0）

交换相关（1）
AIX（10）

故障处理（10）
未分配的博文（88）

文章存档

2019年（1）

2016年（1）

2010年（31）

2009年（88）

2008年（129）

2007年（155）

2006年（197）

2005年（143）

我的朋友

最近访客

推荐博文

ST350-2 Diagnostic Tools

分类：

2006-04-18 11:47:12

Upon completion of this module, you should be able to:
1,differentiate watchdog resets, panics, and system hangs
2,differentiate hardware and software problems
3,provide examples of fatal and non-fatal error conditions
4,identify a comprehensive set of Solaris commands and utilities which are useful in fault analysis
5,describe the syntax, function, and relevance of each command or system file
6,use Solaris commands and files to determine system configuration and status information
7,solve workshop problems using Solaris utilities and system file

error categories-software, hardware-corrected, recoverable, fatal, and critical

error reporting mechanisms-bus errors, interrupts, and resets

Recoverable errors caused by hardware are usually signaled by a bus error posted to the requesting device and a specified interrupt, which could broadcast the error. Error recovery in such cases is normally handled by the trap routines, while error logging is done by the interrupt handler.

Critical errors require immediate attention, system shutdown, and power-off. They are notified through a high-level broadcast interrupt if at all possible.

A fatal error is a hardware error in which proper system operation cannot be guaranteed. All fatal errors initiate a system-watchdog reset. Parity errors on backplanes are an example of a fatal error.

Bus errors are one of the mechanisms for error reporting on the system. Bus errors are issued to the processor when the processor references a virtual or physical location that cannot be satisfied for hardware reasons. some typical bus errors that occur are:
Illegal address or internal hardeare failure
instruction fetch or data load
on an SBus, direct virtual memory access(DVMA) operations
synchronous/asynchronous data store
memory management unit(MMU) operations

System Watchdog Reset
When a fatal error is detected on a multiprocessor machine, a system watchdog reset is initiated. A system watchdog reset affects all CPUs and I/O devices. Writes in progress may be lost, but the state of main memory is not altered and continues to be refreshed after a system watchdog reset. In most cases, the system watchdog reset condition is hardware related.

The modinfo utility displays information ablut loaded kernel modules. With no options, it displays all loaded modules with their associated module identification number and module name.
# modinfo

The modload utility loads a kernel module into a running system
# modload -p misc/obpsym
in the /etc/system file:
forceload: misc/obpsym

The modunload utility unloads a kernel module from a running system
# modinfo | grep obpsym
# modunload -i 89

netstat -i -lists statistics per interface
netstat -r -lists routing table statistics

The truss utility, also known as trace on the Sun Berkeley System Distribution, traces system calls,library calls, and signal activity for the program passed to it as an argument on the command line. It is extremely helpful in determining how programs execute, and identifying points of failure in programs which return error conditions.

There are two main categories of errors which truss reports:
a system call error,often due to an invalid argument being passed to the system call. The man pages on the system calls are a helpful resource, as is the header file /usr/include/sys/errno.h
missing file errors,often manifest with the open() system call statements. Usually, the condition is that the executing program needs to open a file which cannot be found, or for which the contents of the file are invalid or corrupt.

An excerpt of the header file containing the main errors shown in the truss example is included here. this file can be examined on-line in the /usr/include/sys directory
# cat /usr/include/sys/errno.h

阅读(2403) | 评论(0) | 转发(0) |

上一篇：ST350-1 Fault analysis and diagnosis

下一篇：ST350-3 post diagnostics

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6