ERROR_ID TIMESTAMP T C RESOURCE_NAME ERROR_DESCRIPTION 192AC071 0723100300 T 0 errdemon Error logging turned off 0E017ED1 0720131000 P H mem2 Memory failure 9DBCFDEE 0701000000 T 0 errdemon Error logging turned on 038F2580 0624131000 U H scdisk0 UNDETERMINED ERROR AA8AB241 0405130900 T O OPERATOR OPERATOR NOTIFICATION
TIMESTAMP: MMDDHHMMYY (月日时分年) T(类型): P 永久; T 临时; U 未知 (永久性的错误应引起重视) C(分类): H 硬件; S 软件; O 用户; U未知
#diag >; 选高级诊断(Advance Diagnostic) >; 选问题诊断(Problem Determination) 或 选系统检查(System Verification) (选PD 会对系统错误记录进行分析) diag运行后会给出SRN 代码,故障设备名称及百分比,地址代码等。 对于PCI机型应在系统报错7天之内运行diag程序对出错记录里的sense数据进行分析。 7)其他用于收集系统信息的命令 lsdev -C 系统设备信息 #lsdev -Cc disk hdisk0 Available 00-06-00-2,0 4.5 GB 16 Bit SCSI Disk Drive hdisk1 Available 00-06-00-1,0 4.5 GB 16 Bit SCSI Disk Drive hdisk2 Defined 00-06-00-4,0 16 Bit SCSI Disk Drive
启动后选择选项3 "Start Maintenance Mode for System Recovery" >; "Access a Root Volume Group" >; "Access this volume group and start a shell before mounting the file systems" 格式化文件系统日志(jfslog) # /usr/sbin/logform /dev/hd8
以下是HA排错的一些守则: .在第一时间保存好相关的日志文件,特别是那些会被覆盖的文件. .尝试去重复问题的出现.不要被用户所反映的问题迷惑. .渐进地去重复问题,如果有多个可能导致问题的出现,一个一个地去重复,而不要一次重复多个可能. .不要凭经验来判断问题,而是要在各种测试后,由结果来判断. .隔离问题的来源,根据我们上面所叙述的层次关系,至顶向下地诊断. .由简到繁地做测试,我们先从一个简单的环境来做测试,不要尝试在一个复杂的环境中测试. .一次做一次改动,否则我们无法知道是那个改动解决了问题. .不要忽略各种可能,因小可失大,留心系统的每一个细节,包括电源,插头,连线等. .保持各种测试的记录以及解决的步骤,用做将来排错的参考. .拨打IBM服务热线,将问题现象和您所做的测试结果告诉IBM的工程师,他们将在CALL CENTER的测试中心重复试验,必要时会派工程师到场解决问题. 三 IBM HACMP 双机系统的管理和维护 本节将说明HACMP 双机软件的一些基本管理和维护命令这些命令将会在HACMP 双机 系统的日常工作中经常用到. 1 HACMP 双机系统的启动 要启动HACMP 双机系统必须要有root 用户的特权分别进入到系统各节点主机在命令 行上执行下述命令即可. # smit clstart 或 # /usr/sbin/cluster/etc/rc.cluster -boot -N –I 需要注意的是在双机系统中HACMP 双机软件先启动的节点将成为主节点拥有资源 并对外提供关键服务后启动的节点将成为备节点. 另外在启动HACMP 前需要启动双机上的INFORMIX 和SCP 应用. 2 HACMP 双机系统的关闭 要关闭某节点上的HACMP 双机软件必须要有该节点root 用户的特权以root 用户进入到 该节点主机在命令行上执行下述命令即可. # smit clstop 或 # clstop -gr 需要注意的是若该节点是主节点并且备节点上的HACMP 软件亦正常运行则需注意 clstop 关闭模式的三种选项的不同1 forced 是指立即关闭双机软件不调用任何客户应用的 善后处理例程.2 graceful 是指在关闭双机软件时将调用客户应用预定义的善后处理例程.3 takeover 是指该节点将关闭双机软件并释放资源请求备节点进行接管.如该节点是备节点 则关闭模式选项没有多大意义. 另外关闭HACMP 将关闭manager 和informix. 3 查询HACMP 双机系统的状态 在双机系统的运行当中操作员经常需要知道双机系统的当前状态才有可能对双机系 统出现的异常情况进行恢复处理才能保证双机系统的高可用性和高容错性.查询HACMP 双机系统的状态只需以root 用户进入需要查询的节点进行下列操作 首先检查HACMP 双机软件在该节点是否已启动命令如下 # lssrc -g cluster 若是系统显示出下面类似的信息则说明HACMP 双机软件已正常启动. Subsystem Group PID Status clstrmgr cluster 22500 active clsmuxpd cluster 23674 active clinfo cluster 28674 active 在已确认双机软件HACMP 正常启动的情况下在命令行执行下述命令来察看双机系统的当前状态 # /usr/sbin/cluster/clstat -a 如果双机系统一切工作正常则系统将显示下述类似信息 clstat - HACMP for AIX Cluster Status Monitor ------------------------------------------------------------------------------------- Cluster: scp_cluster(80) Thu Jan 20 08:45:17 TAIST 2000 State: UP Nodes: 2 SubState: STABLE Node: mscp1 State: UP Interface: mscp1_svc (0) Address: State: UP Interface: mscp1_tty (1) Address: State: UP Node: mscp2 State: UP Interface: mscp2_svc (0) Address: State: UP Interface: mscp2_tty (1) Address: State: UP 七 常用的系统状态查询命令: # lsdev –C –s scsi 列出各个SCSI设备的所有相关信息:如逻辑单元号,硬件地址及设备文件名等。 # ps -ef 列出正在运行的所有进程的各种信息:如进程号及进程名等。 # netstat -rn 列出网卡状态及路由信息等。 # netstat -in 列出网卡状态及网络配置信息。 # df -k 列出已加载的逻辑卷及其大小信息。 # mount 列出已加载的逻辑卷及其加载位置。 # uname -a 列出系统ID 号,系统名称,OS版本等信息。 # hostname 列出系统网络名称。 # lsvg –l rootvg,lsvg –p rootvg 显示逻辑卷组信息,如包含哪些物理盘及逻辑卷等。 # lslv –l datalv,lslv –p datalv 显示逻辑卷各种信息,如包含哪些盘,是否有镜像等。 八 网络故障定位方法 网络不通的诊断过程: ifconfig 查看网卡是否启动 (up) netstat –i 查看网卡状态 Ierrs/Ipkts 和 Oerrs/Opkts是否>;1% ping自己网卡地址 (ip 地址) ping其它机器地址,如不通,在其机器上用diag检测网卡是否有问题。 在同一网中, subnetmask 应一致。 网络配置的基本方法: (1) 如需修改网络地址、主机名等,一定要用 chdev 命令 # chdev –l inet0 –a hostname=myhost # chdev -l en0 -a netaddr='' -a netmask=’ (2) 查看网卡状态:# lsdev –Cc if (3) 确认网络地址:# ifconfig en0 (4) 启动网卡:# ifconfig en0 up (5) 配置路由 有两种方式加入路由: 永久路由 # chdev -l inet0 -a route=’’,’’ 临时路由 # route add 用命令 netstat -rn 查看路由表 附:常用命令列表: Any XXXX, ####, ****, or X is to be substituted by a name, resource name or #, fn = filename DIR = Directory | = pipe symbol
bosboot -a -d /dev/hdiskx -rebuilds boot record/image on boot device(hdiskx) cat -view contents of a file cat /tmp/****.1 -view a file, look at output cat fn fn >; newfile -combines two files to a single file cd -will return you to default DIR cd / -will put in root DIR cd /xxxx -change you to a DIR anywhere is system cd .. -will drop you out of 1 DIR at a time cd xxxxx -will change you to a DIR in current dir cfgmgr -will auto config devices cfgmgr -v & -(-v) shows processes (& puts in background chps -s xx hd# -increase paging space (xx=# of addt'l PPs) cp oldfn newfn -copy a file cp oldfn Dirn -copy a file to another directory crontab -l -list crontab entries for the current user ctrl + v -will page down 1 page ctrl + 6 -will page up 1 page del fn -same as rm -i,promts to remove fn df -I -shows status of file systems (no inodes) df -Ik -(k) show status in 1024 bites(1mb)(only AIX 4 diag -a -updates changes in hardware configuration diag ***** -****= a device type(as tape,disk....Fastpath) diag -cd rmtX -resets tape drive dosformat -formats a diskette to DOS dosdir -list files on dos formated diskette dosread XX YY -copies dos file XX to aix file YY doswrite YY XX -copies aix file YY to dos file XX errpt -generates a one line synopsis of logged errors errpt | pg -list errorlog 1 page @ a time(1st column is ID) errpt -a -displays detailed information of logged errors errpt -s Mmddhhmmyy -select entries posted later than date errpt -aj XXXXXXX -list detail error by ID number.(XXX=1st column) errpt -d S -list software errors errpt -j XXXXXXX -list summary report by ID number. errpt -aN XXXXXX -list detailed report by resource name column errpt -N XXXXXXX -list summary report by resource name column errclear 0 -clears errorlog errclear -N XXXXX 0 -clears errorlog by resource name, 0=all enter errclear -j XXXXX 0 -clears errorlog by ID number. finger -same as who but with more details flcopy -copies a diskette to another diskette format -formats a diskette in default diskette drive format -l -formats in lower denity: 1.44 on 2.44 / 720 on 1.44 hostname -responds with host system name host (hostname) -responds with internet address instfix -ik IPAR# -lists ipar fix was completely installed lppchk -v -checks install status of LPPs lppchk -v 2>; /dev/lpX -sends output of lppchk to printer lpx lpstat -a all -view all printer queues lptest 80 5 >; /dev/lp0 -send test pattern to lp0 ls -list names of files & directories in current dir ls -lia -list details of files, current dir & subdir ls -al -list details of files or dir in current dir lsattr -El xxxxxx -list specific settings on a device lsdev -C | sort -d -f -list system hardware (devices) lsdev -C | grep 00-0X -list resourses for a adapter lsdev -Cc xxxxx -H -list devices(xxx=tty,printer,disk,memory,adpt lsdev -Cs scsi -list scsi devices(not serial or raid) lsdev -Cc tape -list tape devices lsdev -Cs pci -list pci devices lsdev -Cs isa -list isa devices lscons -lists the assigned console lscfg -list hardware list (same as diags list) lscfg -rl mem* |pg -lists the memory on PCI bus machines lscfg -vl XXXXX -list config info from a device.(rmt0,hdisk,etc) lscfg -vl sysplanar0 -lists the machine type, model, s/n on SMP lsfs -list all filesystems + data from "df" cmd lslpp -l | grep BROKEN -lists incomplete ptfs lslv -m hd5 -finds boot drive under pv1 column lsps -a -checks available paging space lsps -s -checks available paging space lspv -lists information about the physical volumes lspv hdisk# -list drive info lspv -l hdisk# -lists logical volume group disk in lsuser -f ALL -lists all attributes for all users lsvg -lists volume groups lsvg -p XXXXXX -lists disks in volume group (xxxxx= volume name) more -reads files and displays the text one screen at a time. mpcfg -df -list all setting the machine is set to (smp) mpcfg -cf 11 1 -changes to fast IPL on SMP machines (smp) mv fn (path fn) -move and rename a file oslevel -shows AIX version (3.2.4 and above) pg -reads and displays text one screen at a time. pdisable -makes unavailable or shows all disabled tty's pdisable tty# -disables a tty penable -makes available or shows all enabled tty's penable tty# -enables a tty ps -el |pg -look at process running on system pwd -list what DIR you are currently in r -repeats last command rm -i ******* -remove a file & will prompt you if you are sure rmdev -l XXXXX -removes a device and defines it to data base rmdev -l XXXXX -d -removes a device and deletes it from data base set -o vi -sets up to veiw cammands that have been run :wq -write(save) and quit file Esc + k -used with SET command to list last command k,l -k=list next command ran, l=steps you thru command I -use with SET command inserts characters j -steps you backwards cw -cw=removes a word,just type in new word (use with Esc) a,x,r -a=added text, x=delete text, r=replace text(r+letter) R -lets you type over letters or words smit ***** -(*****= tape,disk,tty,etc.fastpath) su -stands for switch user,(NOT super user) su -switches to root id or prompts you for password su XXXXXX -switches to XXXXXX's id tar -cvf /dev/rmtX /etc -will copy /etc to a tape drive tar -tvf /dev/rmtX -will read a tape drive tctl -f /dev/rmtX rewoffl -rewind & eject tape tctl -f /dev/rmtX.1 fsf 3 -forward advances a tape to be read by TAR tctl -F -list avail commands(-F flag is not correct) tctl retension -retensions tape in tape drive & -put any command in background with process ID uptime -how long since last IPL and how many users on system vmstat # # -reports virtual memory statistics and more iostat # # -reports CPU,disk & cdrom statistics use with vm & iostat -1st #(how many sec to repeat), 2nd #(how many times) who -shows users on system who am i -shows user id on your terminal & tty number USE the following with other commands. --------------------------------------------------- >;/tmp/****.1 -creates a file (used with lsXXX command) >;/dev/lp# -redirectes output to a printer(use with a comd) |grep -is useful to search for text in a file. |pg -use after any command to view one page at a time | -pipe sign - Takes the output of one command and feeds it to the input of another. >; -redirect sign or greater than sign / -slash sign \ -back slash sign >;>; -double redirect will add text to end of file & -put any command in background with process ID MUST unmount file system 1st to run fsck & dfsck/only use with a problem ---------------------------------------------------------------------------------------------- fsck XXXXXXX -will check a file system for errors & prompt dfsck /XXXX /XXXX -will check 2 different file sys at the same time FOLLOWING command lines will delete a group of devices as a group, the #, sign is the hdisk#'s that you want to delete.(this is an exampe.) -------------------------------------------------------------------------------------------------- for disk in # # # # -this line and the next 3 line work together do -the prompt will be >; (REMEMBER to hit enter) rmdev -l hdisk# {disk} -d -the prompt will be >; (brackets around disk change) done -the prompt will be >; (on a printout. change to -) SSA RELATED COMMANDS ----------------------------------------- lsattr -El ssaX -list attributes of SSA adapters lscfg -vl ssaX -list VPD of SSA adapters lsdev -C | grep SSA -list all SSA devices lslpp -L | grep SSA -list SSA device drivers maymap -ap -maymap display of SSA loop maymap -alph -maymap display of SSA loop lscfg -vl pdisk* -list VPD of pdisks ssaxlate -l hdiskX -list hdisk to pdisk assignment ssaxlate -l pdiskX -list pdisk to hdisk assignment ssa_rescheck -l hdiskX -show hdisk reservation status FOLLOWING CMDS LIST, COPY, AND RESTORE FOR cpio,tar,dd,backup,dos: NOTE: The fd0 is just a dev. so you may use any media you desire. ----------------------------------------------------------------------------------- LIST COPY ------ -------- cpio -itv < /dev/fd0 ls /tmp/fn | cpio -ov >; /dev/fd0 tar -tvf /dev/fd0 tar -cvf /dev/fd0 fn dd li -l | dd dd if=fn of=/dev/fd0 restore -Tf /dev/fd0 backup -0 -uf /dev/fd0 fn By INODE restore -Tf /dev/fd0 find / -print | backup -i -f/dev/fd0 By NAME dosdir doswrite -a (AIX fn) (fn.ext) TO RESTORE ------------------- cpio -iv fn < /dev/fd0 tar -xvf /dev/fd0 dd of=/dev/fd0 if=fn restore -xvf /dev/fd0 fn BY NAME/INODE, restore understands unless special flags were used. dosread -a (fn.ext) (AIX fn) TO DOCUMENT THE SYSTEM ------------------------------------------- lscfg -v >; /dev/lpx -to list sys config/VPD lsuser -f ALL >; /dev/lpX -to list users lsdev -Cc tty -H -to list all tty's lsdev -Cc lp -H -to list all lp's lsattr -El ttyX >; /dev/lpX -to list ttyX parameters (do for each tty) lsattr -El lpX >; /dev/lpX -to list lpX parameters (do for each lp) lpstat >; /dev/lpX -to list queues lsfs >; /dev/lpx -to list filesystems lspv >; /dev/lpx -to list hard drives lspv hdiskx -to list hard drive config (do for each drive) lspv -l hdiskx -to list files on drive lsvg rootvg -to list rootvg data plus printout of or save to diskette: ------------------------------------------ /etc/inittab /etc/objrepos/Cu* /etc/passwd /etc/filesystems /etc/security/passwd /etc/hosts /sbin/rc.boot