just do it
分类: 系统运维
2012-08-07 09:57:51
内存报错
dhcpA# dmesg
Tue Jun 12 10:34:25 CST 2012
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 675464 kern.info] [AFT2] errID 0x000a0d49.2ebf89a0 PA=0x000000d0.ff451180
Jun 10 16:38:40 dhcpA E$tag 0x00000343.fd249240 E$state_6 Shared
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x94100013.90042200 0x4000c94b.01000000 ECC 0x1b3
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x10bffffe.90042200 0xc400a0c8.80a0a000 ECC 0x1e2
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x22480002.a00c3fef 0x808c2010.0248000e ECC 0x06c
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x05005131.050050e7 0x070050ac.c458a230 ECC 0x183
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 414030 kern.info] [AFT2] errID 0x000a0d49.2ebf89a0 E$tag PA=0x000000d0.ff851180 does not match AFAR=0x000000c0.6cc51180
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 675464 kern.info] [AFT2] errID 0x000a0d49.2ebf89a0 PA=0x000000d0.ff851180
Jun 10 16:38:40 dhcpA E$tag 0x00000343.fe001000 E$state_6 Invalid
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x94100013.90042200 0x4000c94b.01000000 ECC 0x1b3
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x10bffffe.90042200 0xc400a0c8.80a0a000 ECC 0x1e2
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x22480002.a00c3fef 0x808c2010.0248000e ECC 0x06c
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x05005131.050050e7 0x070050ac.c458a230 ECC 0x183
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 510108 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2f8cb3e4
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 317267 kern.info] [AFT0] errID 0x000a0d49.2f8cb3e4 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 371784 kern.info] [AFT0] errID 0x000a0d49.2f8cb3e4 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cc62000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 874116 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2f90e234
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 699108 kern.info] [AFT0] errID 0x000a0d49.2f90e234 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 692025 kern.info] [AFT0] errID 0x000a0d49.2f90e234 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cc68000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 443725 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2fa1112c
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 566558 kern.info] [AFT0] errID 0x000a0d49.2fa1112c Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 957076 kern.info] [AFT0] errID 0x000a0d49.2fa1112c Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cc9e000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 616516 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2faf5778
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 713086 kern.info] [AFT0] errID 0x000a0d49.2faf5778 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 236667 kern.info] [AFT0] errID 0x000a0d49.2faf5778 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cccc000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 742067 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2fd4be78
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 749194 kern.info] [AFT0] errID 0x000a0d49.2fd4be78 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 766156 kern.info] [AFT0] errID 0x000a0d49.2fd4be78 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cd56000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 257905 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.2fdc8c20
Jun 10
16:38:40 dhcpA AFSR
0x00000002
Jun 10
16:38:40 dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 603840 kern.info] [AFT0] errID 0x000a0d49.2fdc8c20 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 665630 kern.info] [AFT0] errID 0x000a0d49.2fdc8c20 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cd6a000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 708239 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.303f1138
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 116827 kern.info] [AFT0] errID 0x000a0d49.303f1138 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 624564 kern.info] [AFT0] errID 0x000a0d49.303f1138 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cee4000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 194182 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.3042c6d4
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 227389 kern.info] [AFT0] errID 0x000a0d49.3042c6d4 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 968584 kern.info] [AFT0] errID 0x000a0d49.3042c6d4 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cee8000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 539156 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.304b8580
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 639356 kern.info] [AFT0] errID 0x000a0d49.304b8580 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 322275 kern.info] [AFT0] errID 0x000a0d49.304b8580 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cf02000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 987447 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.3061f220
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 505254 kern.info] [AFT0] errID 0x000a0d49.3061f220 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 512576 kern.info] [AFT0] errID 0x000a0d49.3061f220 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cf50000
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 194541 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d49.308e9b54
Jun 10 16:38:40
dhcpA AFSR
0x00000002
Jun 10 16:38:40
dhcpA Fault_PC
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 252823 kern.info] [AFT0] errID 0x000a0d49.308e9b54 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:38:40 dhcpA SUNW,UltraSPARC-III+: [ID 729983 kern.info] [AFT0] errID 0x000a0d49.308e9b54 Data Bit 103 was in error and corrected
Jun 10 16:38:40 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:38:40 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6cff6000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 900571 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.bd40c5d0
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08 dhcpA Fault_PC 0x102812c Esynd 0x00b0 Slot C: J3201
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 384081 kern.info] [AFT0] errID 0x000a0d4f.bd40c5d0 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 888714 kern.info] [AFT0] errID 0x000a0d4f.bd40c5d0 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d486000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 130916 kern.info] [AFT2] errID 0x000a0d4f.bd40c5d0 E$tag PA=0x000000a0.fd486080 does not match AFAR=0x000000c0.6d486080
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 977261 kern.info] [AFT2] errID 0x000a0d4f.bd40c5d0 PA=0x000000a0.fd486080
Jun 10 16:39:08 dhcpA E$tag 0x00000283.f5000009 E$state_2 Invalid
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000300.01a80f88 0x00000000.014739e0 ECC 0x19d
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000300.37ecbca8 0x00000300.37eca000 ECC 0x088
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x01000000.00000000 0x00000308.26834000 ECC 0x01a
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000308.2684e000 0x00000300.37ecbce0 ECC 0x144
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 130916 kern.info] [AFT2] errID 0x000a0d4f.bd40c5d0 E$tag PA=0x000000c0.b7086080 does not match AFAR=0x000000c0.6d486080
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 977261 kern.info] [AFT2] errID 0x000a0d4f.bd40c5d0 PA=0x000000c0.b7086080
Jun 10 16:39:08 dhcpA E$tag 0x00000302.dc24c800 E$state_2 Invalid
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x000a3464.0009f440 0x00000016.000a93d8 ECC 0x07f
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x0009f444.00000016 0x000a93e0.0009f448 ECC 0x15e
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000016.000a93e4 0x0009f44c.00000016 ECC 0x0ff
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x000a97e4.0009f450 0x00000016.000a97e8 ECC 0x000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 927125 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be058758
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 905640 kern.info] [AFT0] errID 0x000a0d4f.be058758 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 318480 kern.info] [AFT0] errID 0x000a0d4f.be058758 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d49c000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 286390 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be0c2644
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 548041 kern.info] [AFT0] errID 0x000a0d4f.be0c2644 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 170844 kern.info] [AFT0] errID 0x000a0d4f.be0c2644 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d4ac000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 241656 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be148b18
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 304848 kern.info] [AFT0] errID 0x000a0d4f.be148b18 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 752554 kern.info] [AFT0] errID 0x000a0d4f.be148b18 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d4c4000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 561150 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be376f20
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 899458 kern.info] [AFT0] errID 0x000a0d4f.be376f20 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 696120 kern.info] [AFT0] errID 0x000a0d4f.be376f20 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d540000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 323001 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be40cd2c
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 394215 kern.info] [AFT0] errID 0x000a0d4f.be40cd2c Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 927989 kern.info] [AFT0] errID 0x000a0d4f.be40cd2c Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d55a000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 906698 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be6c30fc
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 703160 kern.info] [AFT0] errID 0x000a0d4f.be6c30fc Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 237798 kern.info] [AFT0] errID 0x000a0d4f.be6c30fc Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d5f6000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 871642 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.be84eef8
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 442920 kern.info] [AFT0] errID 0x000a0d4f.be84eef8 Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 374087 kern.info] [AFT0] errID 0x000a0d4f.be84eef8 Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d64c000
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 680595 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU4 at TL=0, errID 0x000a0d4f.bef0ecac
Jun 10 16:39:08
dhcpA AFSR
0x00000002
Jun 10 16:39:08
dhcpA Fault_PC
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 767282 kern.info] [AFT0] errID 0x000a0d4f.bef0ecac Corrected Memory Error on Slot C: J3201 is Persistent
Jun 10 16:39:08 dhcpA SUNW,UltraSPARC-III+: [ID 583065 kern.info] [AFT0] errID 0x000a0d4f.bef0ecac Data Bit 103 was in error and corrected
Jun 10 16:39:08 dhcpA unix: [ID 566906 kern.warning] WARNING: [AFT0] Most recent 3 soft errors from Memory Module Slot C: J3201 exceed threshold (N=2, T=24h:00m) triggering page retire
Jun 10 16:39:08 dhcpA unix: [ID 618185 kern.notice] NOTICE: Scheduling removal of page 0x000000c0.6d7e8000
Jun 10 16:44:25 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.c1702000 cleared
Jun 10 17:05:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.ce81c000 cleared
Jun 10 17:07:41 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.e070a000 cleared
Jun 10 17:20:00 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.c1f0a000 cleared
Jun 10 17:20:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.fa70a000 cleared
Jun 10 17:40:00 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f4faa000 cleared
Jun 10 17:43:42 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.fbf02000 cleared
Jun 10 17:55:42 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f075e000 cleared
Jun 10 18:31:43 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.d581c000 cleared
Jun 10 19:10:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.c171a000 cleared
Jun 10 19:16:44 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.e8e4e000 cleared
Jun 10 19:19:44 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f87ee000 cleared
Jun 10 19:32:00 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.dffee000 cleared
Jun 10 19:52:45 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.d86ae000 cleared
Jun 10 20:04:46 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f5eb6000 cleared
Jun 10 20:45:00 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f476a000 cleared
Jun 10 21:05:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.fb01e000 cleared
Jun 10 22:01:48 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.fae7a000 cleared
Jun 10 22:02:33 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.d5f16000 cleared
Jun 10 22:04:49 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.f301e000 cleared
Jun 10 22:10:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.dd5fe000 cleared
Jun 11 01:55:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.b8a7e000 cleared
Jun 11 02:12:23 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.dd67e000 cleared
Jun 11 02:38:39 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.d6812000 cleared
Jun 11 04:50:00 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.d4e7a000 cleared
Jun 11 08:10:11 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.fae2e000 cleared
Jun 11 09:55:18 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.dd65e000 cleared
Jun 11 16:10:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.c0e1e000 cleared
Jun 12 09:16:01 dhcpA unix: [ID 221039 kern.notice] NOTICE: Previously reported error on page 0x000000c0.ce012000 cleared
dhcpA#
分析:
当处理器(CPU)从内存中读取数据时探测到一个可修复(CE)的错误,他会修正这个数据并继续他的操作。这个错误会被记录在CPU的AFSR(asynchronous fault status register)中,错误发生的物理地址会被记录到CPU的AFAR(asynchronous fault address register)中,CPU会设置一个管道(take a trap)以便错误信息会被记录下来。
作为错误处理的一部分,Solaris软件会产生关于诊断的些日志,例如:
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 796192 kern.notice] NOTICE: [AFT0] Corrected system bus (CE) Event on CPU18 at TL=0, errID 0x0000c9b9.19d92690
Oct 25 09:06:25
wpc26 AFSR 0x00000002
Oct 25 09:06:25 wpc26 Fault_PC 0x10024a74 Esynd 0x0097 /N0/SB5/P3/B0/D2 J16500
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 154767 kern.notice][AFT0] errID 0x0000c9b9.19d92690 Corrected Memory Error on /N0/SB5/P3/B0/D2 J16500 is Persistent
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 682217 kern.notice][AFT0] errID 0x0000c9b9.19d92690 Data Bit 3 was in error and corrected
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 422650 kern.info][AFT2] errID 0x0000c9b9.19d92690 E$tag PA=0x00000000.00bdf7c0 does not match AFAR=0x00000001.04bdf7c0
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 904800 kern.info] [AFT2] errID 0x0000c9b9.19d92690 PA=0x00000000.00bdf7c0
Oct 25 09:06:25 wpc26 E$tag 0x00000000.01000001 E$state_7 Invalid
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x5a8d0016.00000a20 0x20202020.37333231 ECC 0x128
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x39062c00.5a8d0010 0x00000a20.20202020 ECC 0x03d
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x37333330.32062c00 0x5a8f000c.00000a20 ECC 0x1f6
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x20202020.37333330 0x34062c00.5a8f000d ECC 0x1fc
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 929717 kern.info] [AFT2] D$ data not available
Oct 25 09:06:25 wpc26 SUNW,UltraSPARC-III: [ID 335345 kern.info] [AFT2] I$ data not available
可以看出上述结果是由于一个简单的CE事件引发的,每一条信息都有AFT(asynchronous fault tag)标签,从第四行开始,AFT标记出现了不同的值:
·AFT0 用于可修复错误(used for correctable errors)
·AFT1 用于不可修复错误,也用于可以导致panic的错误
(for uncorrectable errors as well as for errors that result in panic)
·AFT2 用于诊断日志
·AFT3 其他有关于错误的信息
错误信息剖析如下:
[AFT0]Corrected system bus (CE) Event on CPU18 at TL=0, errID
0x0000c9b9.19d92690
AFSR
0x00000002
Fault_PC 0x10024a74 Esynd 0x0097 /N0/SB5/P3/B0/D2 J16500
[AFT0] errID0x0000c9b9.19d92690 Corrected Memory Error on /N0/SB5/P3/
B0/D2 J16500 is Persistent
[AFT0] errID0x0000c9b9.19d92690 Data Bit 3 was in error and corrected
·errID :事件的时间戳,也叫事件的编号,当在同一时间发生多个故障时就需要使用,errID把一个故障的故障信息关联起来。
·AFSR和AFAR:asynchronous fault 状态和地址寄存器。
·Fault_PC:is value of the PC at the time of the fault and is depend upon the fault type as to whether the value is valid
·Esynd:ECC综合捕获(syndrome captured)。
·/N0/SB5/P3/B0/D2:故障内存模块的地址。
·J16500:内存模块中的J数字。
·Persistent :操作系统(SUN Solaris)关于故障的一个描述,这种描述总共有三种:Intermittent,Persistent,or Sticky关于这些描述的详细解释如下:
?Intermittent:说明在重复读写该位置上内存数据时,没有再次出现错误。(也就是内存自己纠正了数据)
?Persistent:说明在重复读写时在该位置上的内存数据再次出现了错误,需要系统操作可以纠正它。
?Sticky:说明系统操作纠正后,这个错误依然存在。这种情况下建议进行进一步的测试,确定该内存是不是需要更换,因为这种情况表示有硬件故障。
Dec 2 19:30:42 mail4.371.net unix: [AFT0] Multiple Softerrors:
Dec 2 19:30:42 mail4.371.net unix: 106 Intermittent, 144 Persistent, and 6 Sticky Softerrors accumulated
Dec 2 19:30:42 mail4.371.net unix: from Memory Module 1803
Dec 2 19:30:42 mail4.371.net unix: [AFT0] CONSIDER REPLACING THE MEMORY MODULE.
需要注意的是:在Solaris 8 KU-9版本,所有SunFire和Ultra Enterprise主机关于内存的错误,会把可纠正的内存错误信息提示到控制台并记录在messages文件中。在Solaris 8 KU-9以前的版本中,Ultra Enterprise中端服务器中关于单条内存的可纠正性错误在小于5是不会记录日志的(Enterprise 10000除外)。
源文档 <>
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 264084 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU0 at
TL=0, errID 0x000506a5.7ffa9d00
Jun 30 03:10:00
dhcpA AFSR
0x00000002
Jun 30 03:10:00
dhcpA Fault_PC
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 893642 kern.info] [AFT0] errID 0x000506a5.7ffa9d00 Corrected Memory Error on Slot C:
J8000 is Intermittent
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 926454 kern.info] [AFT0] errID 0x000506a5.7ffa9d00 Check Bit 7 was in error and corr
ected
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 358666 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU1 at
TL=0, errID 0x000506a5.84d68b18
Jun 30 03:10:00
dhcpA AFSR
0x00000002
Jun 30 03:10:00 dhcpA Fault_PC 0x1037544 Esynd 0x0080 Slot C: J8000
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 707584 kern.info] [AFT0] errID 0x000506a5.84d68b18 Corrected Memory Error on Slot C:
J8000 is Intermittent
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 878472 kern.info] [AFT0] errID 0x000506a5.84d68b18 Check Bit 7 was in error and corr
ected
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 466323 kern.info] [AFT2] errID 0x000506a5.84d68b18 PA=0x000000c0.fecbea40
Jun 30 03:10:00 dhcpA E$tag 0x00000303.fb600023 E$state_1 Modified
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000300.09476998 0x00000300.057a60b0 ECC 0
x11e
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x00000300.0037f620 0x00000000.0113a160 ECC 0
x0e3
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x00000300.00205a18 0xbaddcafe.baddcafe ECC 0
x0d5
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000000.00000000 0x00000300.09477aa8 ECC 0
x1e9
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Jun 30 03:10:00 dhcpA SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available
Jul 5 06:16:45 dhcpA SUNW,UltraSPARC-III+: [ID 590756 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU3 at TL=0, errID 0x0000266d.753a5b5c
Jul 5 06:16:45 dhcpA AFSR 0x00000002
Jul 5 06:16:45 dhcpA Fault_PC
Jul 5 06:16:45 dhcpA SUNW,UltraSPARC-III+: [ID 863297 kern.info] [AFT0] errID 0x0000266d.753a5b5c Corrected Memory Error on Slot C: J2900 is Intermittent
Jul 5 06:16:45 dhcpA SUNW,UltraSPARC-III+: [ID 170368 kern.info] [AFT0] errID 0x0000266d.753a5b5c Data Bit 21 was in error and corrected
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 775088 kern.info] NOTICE: [AFT0] Corrected system bus (CE) Event detected by CPU3 at TL=0, errID 0x00002676.8b12c618
Jul 5 06:17:24 dhcpA AFSR 0x00000002
Jul 5 06:17:24 dhcpA Fault_PC 0x100c7cc Esynd 0x008c Slot C: J2900
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 208345 kern.info] [AFT0] errID 0x00002676.8b12c618 Corrected Memory Error on Slot C: J2900 is Intermittent
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 500377 kern.info] [AFT0] errID 0x00002676.8b12c618 Data Bit 21 was in error and corrected
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 109863 kern.info] [AFT2] errID 0x00002676.8b12c618 PA=0x000000c0.f526aa00
Jul 5 06:17:24 dhcpA E$tag 0x00000303.d4800924 E$state_0 Modified
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0x00000000.00000000 0x00000300.049648d0 ECC 0x15d
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0x0000002d.b922e228 0x00000000.000002a0 ECC 0x063
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x20) 0x6d64002a.83fe8294 0x00000000.010204a0 ECC 0x01f
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x30) 0x00000300.0496c000 0x00000000.01048400 ECC 0x155
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 929717 kern.info] [AFT2] D$ data not available
Jul 5 06:17:24 dhcpA SUNW,UltraSPARC-III+: [ID 335345 kern.info] [AFT2] I$ data not available
Jul 5 06:17:30 d