Informix IDS 11系统管理(918考试)认证指南,第 3 部分: 故障诊断(3)-sdccf-ChinaUnix博客

Fosdccf.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

sdccf

博客访问： 108548167
博文数量： 19283
博客积分： 9968
博客等级：上将
技术积分： 196062
用户组：普通用户
注册时间： 2007-02-07 14:28

文章分类

全部博文（19283）

香文化（0）
CU技术专题（2443）

Linux酷软（214）

tmp（0）

PostgreSQL（93）

Solaris（383）

AIX（173）

SCOUNIX（575）

DB2（1005）
涂鸦（9）
编程开发（1573）

Shell（386）

C/C++（1187）
数据库（6458）

MySQL（1750）

Sybase（465）

Oracle（3695）

Informix（548）
操作系统（8627）

HP-UX（0）

IBM AIX（2）

Sun Solaris（0）

BSD（1）

Linux（8597）

SCO UNIX（23）
未分配的博文（173）

文章存档

2011年（1）

2009年（125）

2008年（19094）

2007年（63）

我的朋友

相关博文

Informix IDS 11系统管理(918考试)认证指南,第 3 部分: 故障诊断(3)

分类： DB2/Informix

2008-05-31 16:44:54

onstat 实用程序

onstat 是一个强大的实用程序，可以用来了解故障诊断所需的各种信息，包括内存使用情况、网络使用情况、会话活动、缓冲池使用情况和磁盘使用情况。在本节中，讨论如何使用 onstat 检修这些方面的故障。

有几个命令可以跟踪内存使用情况：

执行 onstat -g mem 命令会提供内存中不同池的内存使用情况。

                    
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:28:21
 -- 39936 Kbytes

Pool Summary:
afpool       V     10ad28040        8192             2488             5	         3
tpcpool      V     10b034040        40960            4776             32	         3
seqpool      V     10b06a040        4096             768              2           1
pnlpool      V     10b037040        77824            4344             69	         5
sbtlist      V     10ae10040        20480            7232             4	         3
dstpool      V     10b033040        8192             3320             2	         2
sqcrypto     V     10b21b040        4096             504              2           1
ampool       V     10b061040        8192             3088             22	         1
Blkpool Summary:
name         class addr             size             #blks
mt           V     10ad2b450        1527808          21
global       V     10ad26290        0                0

表 1 描述了池的摘要信息。

名称	描述
Name	池名称
Class	共享内存类（R=Resident，V=Virtual）
Addr	池的头内存地址
size	池的总大小（以字节为单位）
Freesize	池中的空闲内存
#allocfrag	分配的内存段数量
#freefrag	空闲的内存段数量

名称	描述
Name	池名称
Class	共享内存类（R=Resident，V=Virtual）
Addr	池的头内存地址
size	池的总大小（以字节为单位）
#blksize	池中的块数量

onstat -g ses 可以显示一个会话的内存使用情况。这对于寻找内存泄漏等问题非常方便，如果执行 onstat -g ses -r，发现内存分配不断增加，但是内存使用量并未增加，那么很可能是出现了内存泄漏。

我们先看看一般的输出：

                    
$  onstat -g ses

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:52:52
 -- 39936 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
35       informix -        0        -        0        12288      11592      off
18       informix -        0        -        1        425984     338128     off
17       informix -        0        -        1        434176     337408     off
16       informix -        0        -        1        282624     227560     off
5        informix -        0        -        0        12288      11592      off
3        informix -        0        -        0        16384      13176      off
2        informix -        0        -        0        12288      11592      off
35	informix -        0        -        0        81920	 76360	   off

最重要的部分是总内存和使用的内存。但是，可以对某个会话 id 使用 onstat -g ses 进一步查询内存使用情况。

                    
$ onstat -g ses 35

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:55:28 
-- 39936 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
35       informix 53       15046    ryleh    1        81920      76360      off

tid      name     rstcb            flags    curstk   status
59       sqlexec  10afaa7d0        Y--P---  7791     cond wait(sm_read)

Memory pools    count 2
name         class addr              totalsize  freesize   #allocfrag #freefrag
35           V     10bf99040        77824      4720       110        5
35*O0        V     10c072040        4096       840        1          1

name           free       used           name           free       used
overhead       0          6512           scb            0          144
opentable      0          2568           filetable      0          496
log            0          12096          temprec        0          1696
keys           0          800            ralloc         0          18992
gentcb         0          1640           ostcb          0          2864
sqscb          0          18880          sql            0          7
rdahead        0          160            hashfiletab    0          552
osenv          0          2880           buft_buffer    0          2168
sqtcb          0          3216           fragman        0          488
sapi           0          64

sqscb info
scb              sqscb            optofc   pdqpriority sqlstats optcompind  directives
10bde6400        10bf63028        0        0           0        2           1


Sess  SQL            Current            Iso Lock       SQL  ISAM F.E.
Id    Stmt type      Database           Lvl Mode       ERR  ERR  Vers Explain

35    SELECT         sysmaster          CR  Not Wait   0    0    9.24 Off


Current statement name : slctcur

Current SQL statement :
  select * from systables

Last parsed SQL statement :
  select * from systables

onstat -g afr 选项显示一个指定的会话或共享内存池的已分配内存段。每个会话都会获得一个共享内存池。

这个命令可以用来了解分配了哪个内存池。例如，如果在反复运行这个命令（onstat -g afr -r）时，发现 ralloc 池不断增长，就可能找到了内存增长过度的内存池。

                    
$ onstat -g afr 35
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 02:12:18 
-- 39936 Kbytes

Allocations for pool name 35:
addr             size       memid
10bf99000        3256       overhead
10bf99cb8        80         scb
10bf99d08        64         scb
1bf99d48         64         ostcb
10bf99d88        552        opentable
10bf99fb0        80         osenv
10bf63000        6856       sqscb
10bf64ac8        64         sqscb
10bf64b08        72         sql
10bf64b50        72         filetable
10bf64b98        80         fragman
10bf64be8        80         sqscb
10bf64c38        64         sqscb
.
.
.
.
.
.
10bf57f78        136        fragman
10c071000        2744       ralloc
10c071ab8        1024       ralloc
10c073000        2168       buft_buffer

输出	描述
addr	池段的内存地址
size	池段的大小（以字节为单位）
memid	池段的内存 ID

与 afr 相似的一个选项是 ffr，它显示一个共享内存池的空闲段。

                    
$ onstat -g ffr 35
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 02:15:57
 -- 39936 Kbytes

Free lists for pool name 35:
addr             size        idx
10c073878        1928        1
10bf9ba40        104         11
10c071eb8        328         39
10bf9cdd8        552         66
10bf548a8        1808        99

输出	描述
addr	池段的内存地址
size	池段的大小（以字节为单位）

假设系统管理员告诉您，IDS 占用了太多的内存，而且内存使用量还在不断增加。如果您最近没有修改过任何配置，那么首先要查明的是，是否有任何会话占用了大量内存。

                    
$  onstat -g ses

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:52:52
 -- 39936 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
35       informix -        0        -        0        12288      11592      off
18       informix -        0        -        1        425984     338128     off
17       informix -        0        -        1        434176     337408     off
.
.
.
2        informix -        0        -        0        12288      11592      off
2301     bad_app  -        0        -        1      3203072      16384 	   off
8220     bad_app  -        0        -        1      3194880      16384 	   off
1704     bad_app  -        0        -        1      3203072      16384 	   off
430      bad_app  -        0        -        1     19169280      16384 	   off
1991     bad_app  -        0        -        1      3203072      16384 	   off

可以发现，有一个应用程序只使用大约 16KB 的内存，但是（至少在一个会话中）已经分配了超过 1 GB 的内存。

我们进一步看看这个会话。

                    
430      bad_app  -        0        -        1     19202048      16384 	    off

啊！它仍然在增长。看来这里出现了内存泄漏。现在，可以打电话给您的开发人员，询问他们修改了什么东西。

回页首

使用 onstat -g nta 显示来自 -g ntd、-g ntm、-g ntt 和 -g ntu 的组合网络统计数据。如果安装了 MaxConnect，就可以使用这个命令提供的统计数据进行 MaxConnect 性能调优。

onstat -g ntd 显示全局网络信息：

                    
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:00:44 
-- 38912 Kbytes

global network information:
#netscb connects     read    write    q-free  q-limits  q-exceed alloc/max
6/   6        1        8        8    0/   0  135/  10    0/   0    1/   1

Client Type     Calls   Accepted   Rejected       Read      Write
sqlexec         yes            1          0          7          8
srvinfx         yes            0          0          0          0
onspace         yes            0          0          0          0
onlog           yes            0          0          0          0
onparam         yes            0          0          0          0
oncheck         yes            0          0          0          0
onload          yes            0          0          0          0
onunload        yes            0          0          0          0
onmonitor       yes            0          0          0          0
dr_accept       yes            0          0          0          0
cdraccept       no             0          0          0          0
ontape          yes            0          0          0          0
srvstat         yes            0          0          0          0
asfecho         yes            0          0          0          0
listener        yes            0          0          1          0
crsamexec       yes            0          0          0          0
onutil          yes            0          0          0          0
safe            yes            0          0          0          0
drdaexec        no             0          0          0          0
smx             yes            0          0          0          0
Totals                         1          0          8          8

onstat -g ntm 显示网络邮件统计数据。

                    
$ onstat -g ntm
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:03:35 
-- 38912 Kbytes

global network information:
#netscb connects     read    write    q-free  q-limits  q-exceed alloc/max
6/   6        1        8        8    0/   0  135/  10    0/   0    1/   1

Network mailbox information:
box           netscb thread name     max received   in box   max in box full signal
5        10b239928 tlitcppoll       10        4        0        2        0	yes
6        10b250928 tlitcplst        10        0        0        0        0	no

onstat -g ntm 显示完整的网络时间。

                    
$ onstat -g ntt
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:04:55
 -- 38912 Kbytes

global network information:
#netscb connects     read    write    q-free  q-limits  q-exceed alloc/max
6/   6        1        8        8    0/   0  135/  10    0/   0    1/   1

Individual thread network information (times):
   netscb thread name    sid     open     read    write address
10b4bf368 sqlexec          4 07:34:44 07:34:55 07:34:55
10c10ccd0                 17 07:34:28
10b270790                 16 07:34:28
10b3abd18                 15 07:34:28
10b250928 tlitcplst        3 07:34:22 07:34:44          ryleh|1537|tlitcp
10b239928 tlitcppoll       2 07:34:22

注意，输出在 write address 列中提供了大量信息。它提供来自 sqlhosts 文件的服务器 | 端口 | 协议组合。如果有多个监听器线程的话，这些信息有助于跟踪问题。

onstat -g ntu 显示网络用户统计数据。

                    
$ onstat -g ntu
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:08:28 
-- 38912 Kbytes

global network information:
#netscb connects     read    write    q-free  q-limits  q-exceed alloc/max
6/   6        1        8        8    0/   0  135/  10    0/   0    1/   1

Individual thread network information (basic):
netscb type   thread name    sid   fd poll    reads   writes q-nrm q-pvt q-exp
10b4bf368 tlitcp sqlexec          4    2    5        8        8  0/ 1  1/1  0/ 0
10c10ccd0 tlitcp unknown         17    0    0        0        0  0/ 0  0/ 0  0/ 0
10b270790 tlitcp unknown         16    0    0        0        0  0/ 0  0/0  0/ 0
10b3abd18 tlitcp unknown         15    0    0        0        0  0/ 0  0/ 0  0/ 0
10b250928 tlitcp tlitcplst        3    1    5        1        0  0/ 0  0/0  0/ 0
10b239928 tlitcp tlitcppoll       2    0    5        7        0  0/ 0  0/0/ 0

假设您作为 DBA 接到了用户的电话，他们说他们无法连接网络了。

看看在线日志，没发现什么异常行为。所以，运行 onstat -g nta 看看是否有任何反常现象。

                    

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 01:16:37 -- 389
12 Kbytes

global network information:
  #netscb connects     read    write    q-free  q-limits  q-exceed alloc/max
   6/   7       15      936      714    1/   1  135/  10    0/   0    1/   1

Individual thread network information (basic):
          netscb type   thread name    sid   fd poll    reads   writes q-nrm q-p
vt q-exp
       10b434b88 tlitcp unknown         19    0    0        0        0  0/ 0  0/
 0  0/ 0
       10b294848 tlitcp unknown         18    0    0        0        0  0/ 0  0/
 0  0/ 0
       10b270ae0 tlitcp unknown         17    0    0        0        0  0/ 0  0/
 0  0/ 0
       10b274928 tlitcp tlitcplst        4    2    5        0        0  0/ 0  0/
 0  0/ 0
       10b250928 tlitcp tlitcplst        3    1    5       15        0  0/ 0  0/
 0  0/ 0
       10b239928 tlitcp tlitcppoll       2    0    5      936        0  0/ 0  0/
 0  0/ 0
       10b239928 tlitcp tlitcppoll       2    0    5      936        0  0/ 0  0/
 0  0/ 0

Individual thread network information (times):
          netscb thread name    sid     open     read    write address

       10b434b88                 19 09:54:00

       10b294848                 18 09:54:00

       10b270ae0                 17 09:54:00

       10b274928 tlitcplst        4 09:53:54 10:04:03          ryleh|1538|tlitcp

       10b250928 tlitcplst        3 09:53:54 11:54:38          ryleh|1537|tlitcp

       10b239928 tlitcppoll       2 09:53:54


Network mailbox information:
 box           netscb thread name     max received   in box   max in box full si
gnal
   5        10b239928 tlitcppoll       10       49        0        4        0
gnal
   5        10b239928 tlitcppoll       10       49        0        4        0
 yes
   6        10b250928 tlitcplst        10        0        0        0        0
  no
   7        10b274928 tlitcplst        10        0        0        0        0
  no

Client Type     Calls   Accepted   Rejected       Read      Write
sqlexec         yes           15          0        921        714
srvinfx         yes            0          0          0          0
onspace         yes            0          0          0          0
onlog           yes            0          0          0          0
onparam         yes            0          0          0          0
oncheck         yes            0          0          0          0
onload          yes            0          0          0          0
onunload        yes            0          0          0          0
onmonitor       yes            0          0          0          0
dr_accept       yes            0          0          0          0
cdraccept       no             0          0          0          0
ontape          yes            0          0          0          0
srvstat         yes            0          0          0          0
asfecho         yes            0          0          0          0
listener        yes            0          0         15          0
onunload        yes            0          0          0          0
onmonitor       yes            0          0          0          0
dr_accept       yes            0          0          0          0
cdraccept       no             0          0          0          0
ontape          yes            0          0          0          0
srvstat         yes            0          0          0          0
asfecho         yes            0          0          0          0
listener        yes            0          0         15          0
crsamexec       yes            0          0          0          0
onutil          yes            0          0          0          0
safe            yes            0          0          0          0
drdaexec        no             0          0          0          0
smx             yes            0          0          0          0
Totals                        15          0        936        714


No MaxConnect instances connected

IO statistics for each MaxConnect instance:
   IMCid    header      data   partial   blocked      data   partial   blocked
             reads     reads     reads     reads    writes    writes    writes
       -         -         -         -         -         -         -         -

在 onstat -g ntm 的输出（邮箱）中可以发现，其中一个监听器线程很长时间没有进行任何读操作了。

通过查看 sqlhosts 文件的第一列中对应的记录，从而将邮箱与特定的端口/DBSERVERALIAS 组合联系起来。

                    
       10b274928 tlitcplst        4 09:53:54 10:04:03          ryleh|1538|tlitcp

                    
demo_alias	ontlitcp	ryleh	1538

                           	
DBSERVERALIASES	demo_alias

所以在生产环境中，便捷的解决方案是将用户切换到另一个 DBSERVERALIAS 或 DBSERVERNAME。探索问题的根源超出了本教程的范围。

回页首

不带会话 id 运行 onstat -g ses，就会为系统上当前活动的每个会话生成一行汇总信息。

正如前面提到的，这个命令非常有助于寻找客户机应用程序中的内存泄漏。

可以使用 -r 选项运行这个命令（onstat -g ses -r ），从而观察在使用的内存总量保持稳定的情况下，总内存是否不断增长。

                    
$ onstat -g ses
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:14:47 
-- 38912 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
18       informix -        0        -        0        12288      11592      off
17       informix -        0        -        1        303104     260672     off
16       informix -        0        -        1        307200     265480     off
15       informix -        0        -        1        278528     227640     off
4        informix 53       25457    ryleh    1        98304      91928      off
3        informix -        0        -        0        16384      13176      off
2        informix -        0        -        0        12288      11592      off

在输出中，可以看到应用程序进程的 pid、它拥有的线程数量和总内存。

要想了解某个会话的更多信息，可以使用 onstat -g ses 以及特定会话 id：

                    
$ onstat -g ses 35

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:55:28
 -- 39936 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
35       informix 53       15046    ryleh    1        81920      76360      off

tid      name     rstcb            flags    curstk   status
59       sqlexec  10afaa7d0        Y--P---  7791     cond wait(sm_read)

Memory pools    count 2
name         class addr              totalsize  freesize   #allocfrag #freefrag
35           V     10bf99040        77824      4720       110        5
35*O0        V     10c072040        4096       840        1          1

name           free       used           name           free       used
overhead       0          6512           scb            0          144
opentable      0          2568           filetable      0          496
log            0          12096          temprec        0          1696
keys           0          800            ralloc         0          18992
gentcb         0          1640           ostcb          0          2864
sqscb          0          18880          sql            0          7
rdahead        0          160            hashfiletab    0          552
osenv          0          2880           buft_buffer    0          2168
sqtcb          0          3216           fragman        0          488
sapi           0          64

sqscb info
scb              sqscb            optofc   pdqpriority sqlstats optcompind  directives
10bde6400        10bf63028        0        0           0        2           1


Sess  SQL            Current            Iso Lock       SQL  ISAM F.E.
Id    Stmt type      Database           Lvl Mode       ERR  ERR  Vers Explain

35    SELECT         sysmaster          CR  Not Wait   0    0    9.24 Off


Current statement name : slctcur

Current SQL statement :
  select * from systables

Last parsed SQL statement :
  select * from systables

可以看到，这里提供了关于这个会话的大量信息，包括隔离级别、当前的 SQL 语句、前端版本以及前面讨论过的内存使用情况。

要注意一下 rstcb。这对应于 onstat -u 输出中的第一列。

假设在解决了这个 DBSREVERALIAS 问题之后，仍然有位用户抱怨说，他的应用程序仍然没反应（他的用户名是 bad_user）。

                    
2257        bad_user -        0        -        0        122288      121592      off

这里没有什么不正常的现象。使用的内存非常接近分配的内存。我们使用 onstat -g ses 看看这个用户在做什么：

                    
$ onstat -g ses 35

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 2 days 01:55:28
 -- 39936 Kbytes

session                                      #RSAM    total      used       dynamic
id       user     tty      pid      hostname threads  memory     memory     explain
2257     bad_user        15098      ryleh    1        122288      121592     off

tid      name     rstcb            flags    curstk   status
2070     sqlexec  10afaa7d0        Y--P---  7791     cond wait(sm_read)

Memory pools    count 2
name         class addr              totalsize  freesize   #allocfrag #freefrag
35           V     10bf99040        77824      4720       110        5
35*O0        V     10c072040        4096       840        1          1

name           free       used           name           free       used
overhead       0          6512           scb            0          144
opentable      0          2568           filetable      0          496
log            0          12096          temprec        0          1696
keys           0          800            ralloc         0          18992
gentcb         0          1640           ostcb          0          2864
sqscb          0          18880          sql            0          7
rdahead        0          160            hashfiletab    0          552
osenv          0          2880           buft_buffer    0          2168
sqtcb          0          3216           fragman        0          488
sapi           0          64

sqscb info
scb              sqscb            optofc   pdqpriority sqlstats optcompind  directives
10bde6400        10bf63028        0        0           0        2           1


Sess  SQL            Current            Iso Lock       SQL  ISAM F.E.
Id    Stmt type      Database           Lvl Mode       ERR  ERR  Vers Explain

35    INSERT         sysmaster          CR  Not Wait   0    0    9.24 Off


Current statement name : slctcur

Current SQL statement :
  INSERT INTO COMP_PREP_1_6   (contrct_id,cstomer_id,
    rep_id,run_mode,canvass_code, canvass_issue_year,
    channel_code,sort_ind)         SELECT  {+ ORDERED}
    CONTRACT.contract_id,         DPOP_CST_TMP.customer_id,
    ASSIGNMENT.rep_id,            'B',
    ASSIGNMENT.canvass_code,
    ASSIGNMENT.canvass_issue_year,ASSIGNMENT.channel_code,      4
                   FROM DPOP_CST_TMP,          CONTRACT,
    ASSIGNMENT,                   CLOSE_CANVASS_TMP             WHERE
    CONTRACT.cstomer_id =  DPOP_CST_TMP.cstomer_id    AND
    CONTRACT.contract_status = 'R'                          AND
    ASSIGNMENT.assignment_id = CONTRACT.assignment_id       AND
    ASSIGNMENT.canvass_code = CLOSE_CANVASS_TMP.canvass_code AND
    ASSIGNMENT.canvass_issue_year =
    CLOSE_CANVASS_TMP.canvass_issue_year                        AND
    (ASSIGNMENT.channel_code =CLOSE_CANVASS_TMP.channel_code OR
    CLOSE_CANVASS_TMP.channel_code = '**')                  AND EXISTS (
                SELECT 1                     FROM CTR_TRACKING CTRTRK1,
     CTR_TRACKING CTRTRK2     WHERE CTRTRK1.contract_id =
    CONTRACT.contract_id          AND CTRTRK1.contract_status = 'R'
                   AND CTRTRK1.ctap_event_id > ?
    AND CTRTRK2.contract_id =     CONTRACT.contract_id          AND
    CTRTRK2.contract_status = 'O'                           AND
    CTRTRK2.ctap_event_id = ( SELECT
    MAX(CTRTRK3.ctap_event_id)    FROM CTR_TRACKING CTRTRK3     WHERE
    CTRTRK3.contract_id =   CTRTRK1.contract_id           AND
    CTRTRK3.ctap_event_id = CTRTRK1.ctap_event_id  ))

Last parsed SQL statement :
  select * from CONTRACTS

这是一个复杂的插入操作。但是先等等。我们看看线程的标志和状态。

标志是 Y--P---。查阅手册发现，这个线程正在执行某个条件。从线程的状态可以查明这个条件是 cond_wait(smread)。

所以，您现在要求这位用户仔细看看客户机应用程序，因为 sqlexec 正在等待客户机应用程序告诉它应该做什么。

回页首

onstat -p 可以用来检查缓冲池的使用情况：

                    
$ onstat -p
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:17:03 
-- 3812 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
933      961      101742   99.08   173      441      6221     97.22


isamtot  open     start    read     write    rewrite  delete   commit     rollbk
19993    1533     2346     6077     685      37       346      807	 0

gp_read    gp_write   gp_rewrt   gp_del     gp_alloc   gp_free    gp_curs
0          0          0          0          0          0          0

ovlock     ovuserthread ovbuff     usercpu  syscpu   numckpts   flushes
0          0            0          3.55     0.52     2          2

bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress seq scans
240      0        9130     0        0        0        385      137 0

ixda-RA    idx-RA     da-RA      RA-pgsused lchwaits
38         5          162        203        3

首先要看看 ovbuff。ovbuff 是引擎用光缓冲区的次数。如果 ovbuff 正在增加，那么可以考虑增加缓冲池的大小。

注意：SMI 表 sysprofile 中也包含相同的信息。

onstat -F 用来获得执行的各种写操作的计数。

                    
$ onstat -F
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:20:45 
-- 38912 Kbytes

Fg Writes     LRU Writes    Chunk Writes
0             0             98

address           flusher  state    data     # LRU    Chunk    Wakeups  Idle Time
10afa5820        0        I        0        0        1        1239     1237.670
states: Exit Idle Chunk Lru

如果 Fg 写操作正在增加，那么可以考虑进行调优。

onstat -R 用来监视 LRU 队列。对于每个队列，onstat -R 列出队列中的缓冲区数量，以及已经修改的缓冲区的数量和百分比。

                    
$ onstat -R
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:30:51
 -- 3812 Kbytes
Buffer pool page size: 2048

8 buffer LRU queue pairs              priority levels
# f/m   pair total     % of    length       LOW      HIGH
0 F        624     100.0%      624        624          0
1 m                  0.0%        0          0          0
2 f        625     100.0%      625        625          0
3 m                  0.0%        0          0          0
4 f        626     100.0%      626        626          0
5 m                  0.0%        0          0          0
6 f        625     100.0%      625        625          0
7 m                  0.0%        0          0          0
8 f        625     100.0%      625        625          0
9 m                  0.0%        0          0          0
10 f        625     100.0%      625        625          0
11 m                  0.0%        0          0          0
12 f        625     100.0%      625        625          0
13 m                  0.0%        0          0          0>
14 f        625     100.0%      625        625          0
15 m                  0.0%        0          0          0
0 dirty, 5000 queued, 5000 total, 8192 hash buckets, 2048 buffer size
start clean at  60.000% (of pair total) dirty, or 374 buffs dirty, stop at
 50.000%

列	描述
Buffer pool page size	缓冲池的页面大小（以字节为单位）
#	显示队列号。每个 LRU 队列由两个子队列组成：一个 FLRU 队列和一个 MLRU 队列。（FLRU 和 MLRU 队列的定义见 IBM Informix 管理员指南共享内存一章中有关 LRU 队列的内容。）队列 0 和 1 属于第一个 LRU 队列，队列 2 和 3 属于第二个 LRU 队列，以此类推。
f/m	标识队列类型：这个字段有四个可能的值： f —— 空闲的 LRU 队列。在这个上下文中，空闲意味着未修改。即使 LRU 队列中的几乎所有缓冲区都是可用的，但是数据库服务器会尝试使用 FLRU 队列（而不是 MLRU 队列）中的缓冲区。（修改过的缓冲区必须先写到磁盘上，然后数据库服务器才能使用这个缓冲区。） F —— 具有最少量元素的空闲 LRU。数据库服务器使用该估测值决定将未修改的（空闲）缓冲区放在哪里。 m —— MLRU 队列 M —— 刷新器正在进行清理的 MLRU 队列
length	跟踪队列的长度（缓冲区数量）
% of	显示子队列占 LRU 队列的百分比。例如，假设一个 LRU 队列中有 50 个缓冲区，其中 30 个缓冲区属于 MLRU 队列，20 个属于 FLRU 队列，那么 % of 列会分别列出 60.00 和 40.00。
pair total	提供这个 LRU 队列中缓冲区的总数。
priority levels	显示优先级：LOW、MED_LOW、MED_HIGH、HIGH

各个 LRU 队列的信息后面是摘要信息。对摘要信息的解释如下：

列	描述
dirty	所有 LRU 队列中已经修改的缓冲区的总数
queued	LRU 队列中的缓冲区总数
total	缓冲区总数
hash buckets	散列桶的数量
buffer size	每个缓冲区的大小
start clean	LRU_MAX_DIRTY 的值
stop at	LRU_MIN_DIRTY 的值
priority downgrades	降到较低优先级的 LRU 队列数量
priority upgrades	升到较高优先级的 LRU 队列数量

假设您是一位 DBA，有用户报告说性能正在降低。

检查是否有性能瓶颈的方法之一是查看 onstat -p 的输出。只运行这个命令一次可能是不够的，因为它只提供一个快照。但是，可以使用 -r 选项，检查是否有什么东西导致了性能下降。

                    
$ onstat -pr
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:17:03 
-- 3812 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
2007     1000     201742   99.08   173      441      6221     97.22


isamtot  open     start    read     write    rewrite  delete   commit     rollbk
19993    1533     2346     6077     685      37       346      807	 0

gp_read    gp_write   gp_rewrt   gp_del     gp_alloc   gp_free    gp_curs
0          0          0          0          0          0          0

ovlock     ovuserthread ovbuff     usercpu  syscpu   numckpts   flushes
0          0            0          3.53     0.52     2          2

bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress seq scans
2400      0        9130     0        0        0        385      137 0

ixda-RA    idx-RA     da-RA      RA-pgsused lchwaits
38         5          162        203        3
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:17:03
-- 3812 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
2100     1005      203742   99.08   173      441      6221     97.22


isamtot  open     start    read     write    rewrite  delete   commit     rollbk
19993    1533     2346     6077     685      37       346      807	 0

gp_read    gp_write   gp_rewrt   gp_del     gp_alloc   gp_free    gp_curs
0          0          0          0          0          0          0

ovlock     ovuserthread ovbuff     usercpu  syscpu   numckpts   flushes
0          0            0          3.55     0.54     2          2

bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress seq scans
2706      0        9130     0        0        0        385      137 0

ixda-RA    idx-RA     da-RA      RA-pgsused lchwaits
38         5          162        203        3
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:17:03
-- 3812 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
2103      1020      206641   99.08   173      441      6221     97.22


isamtot  open     start    read     write    rewrite  delete   commit     rollbk
19993    1533     2346     6077     685      37       346      807	 0

gp_read    gp_write   gp_rewrt   gp_del     gp_alloc   gp_free    gp_curs
0          0          0          0          0          0          0

ovlock     ovuserthread ovbuff     usercpu  syscpu   numckpts   flushes
0          0            0          3.57     0.54     2          2

bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress seq scans
3002      0        9130     0        0        0        385      137 0

ixda-RA    idx-RA     da-RA      RA-pgsused lchwaits
38         5          162        203        3

从清单 10 可以看出，bufwaits 增长得非常快。肯定有某个（或某些）页面是热的。

进一步探索有两种方法。我个人喜欢通过查看 onstat -X 的输出，了解哪个缓冲区是热缓冲区。onstat -X 会显示缓冲区存取信息。

                    
$ onstat -X

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 01:38:47 -- 389
12 Kbytes

Buffers (Access)
address          owner            flags pagenum          memaddr          nslots
10a21ede0        0                103  11:5893         10a627800        9
pgflgs scount   waiter
80e    90     	10afa5028
 		10afa5820
 		10afa6028
 		.
 		.
 		.
 		10afaa028

Buffer pool page size: 2048
 200 modified, 5000 total, 8192 hash buckets, 2048 buffer size

可以发现其中一个页面有许多人都试图访问。现在事情变得简单了，将页面映射到 onstat -u/onstat -g ses 级别，了解这个页面的内容以及使用它的情况。

在 pgflags 输出中有一个重大线索。0x80e 意味着这个页面来自一个大块，而且它是一个 blob 位图页面。

waiter 和 owner 列对应于 onstat -u 输出中的第一列。

回页首

onstat -d 显示 dbspace 和块的磁盘使用情况。

                    
 onstat -d
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:39:25 
-- 3812 Kbytes

Dbspaces
address       number   flags    fchunk   nchunks  pgsize   flags    owner	    name
10aedee78     1        0x40001  1        1        2048     N  B     informix rootdbs
1 active, 2047 maximum

Chunks
address   chunk/dbs offset size  free   bpages  flags  pathname
10aedf028 1     1   0      15000 1238   PO-B           /testing/prod/1110FC1B5/SERVER
                                                       /chunks/rootchunk
1 active, 32766 maximum

注意：DBspace 块的 “size” 和 “free” 列中的值是按照它们所属的 DBspace 的 “pgsize” 显示的。

扩展块容量模式：always。

列	描述
Size	块大小（即在线页面数量）
Free	这个块中的空闲在线页面数量

onstat -d 与 onstat -D 命令非常相似。

                    
$ onstat -D
IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 00:42:00 
-- 38912 Kbytes

Dbspaces
address      number   flags     fchunk   nchunks  pgsize   flags   owner     name
10aedee78    1        0x40001   1        1        2048     N  B    informix  rootdbs
1 active, 2047 maximum

Chunks
address      chunk/dbs  offset    page Rd  page Wr  pathname
10aedf028    1     1    0         964      492      /testing/prod/1110FC1B5/SERVER/
					      chunks/rootchunk
1 active, 32766 maximum

注意： DBspace 块的 “page Rd” 和 “page Wr” 列中的值是按照系统基本页面大小显示。

扩展块容量模式：always。

这里没有显示块的大小，而是显示每个块的读写页面的数量。

可以通过反复执行这个命令，在块级监视 I/O。与在分区级监视 I/O（onstat -P）相比，这种检查的粒度不够细。但这是在高层监视 I/O 活动的好方法。

onstat -g iof 选项显示对每个块的读/写数量。如果一个块的 I/O 活动不成比例，这个块就可能是系统瓶颈。这个选项可以用来监视针对分段表的不同段的 I/O 请求分布情况。

                    
$ onstat -g iof

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 01:21:55 
-- 38912 Kbytes

AIO global files:
gfd pathname         bytes read     page reads  bytes write    page writes io/s
3   rootchunk        2697216        1317        2418688        1181        224.8
op type     count          avg. time
seeks       0              N/A
reads       904            0.0010
writes      454            0.0112
kaio_reads  0              N/A
kaio_writes 0              N/A

onstat -g iob 选项显示大缓冲区使用摘要。

onstat -g iov 选项显示每个虚拟处理器的异步 I/O 统计数据。

onstat -g ioa 在输出中组合 onstat -g iob、onstat -g iof 和 onstat -g iov 的信息。

磁盘故障诊断是性能调优方面的主要工作。通常某个进程（比如装载和批处理进程）一般花费特定的时间，但是由于某种原因长时间无法完成。

假设一个用户打电话说，他正在更新一个表中的几百万行，他想知道为什么要花这么长时间。

探索问题原因的一种方法是在块级查看活动。

                    

IBM Informix Dynamic Server Version 11.10.FB5TL -- On-Line -- Up 06:23:43 -- 389
12 Kbytes

Dbspaces
address      number   flags      fchunk   nchunks  pgsize   flags   owner     name
10aeded78	    1	    0x40001    1        1        2048     N B     informix  rootdbs
.
.
10aedee88    2        0x40001    12       1        2048     N B     informix  datadbs
.
.
10aedee98	    3        0x40001    17       1        2048     N B     informix  idxdbs
 1 active, 2047 maximum

Chunks
address      chunk/dbs  offset     page Rd  page Wr   pathname
10aedf028    12     2   0          157000   42790     /dev/chunks/datachunk
.
.
.
10aeee038	    17	  3   0	        223414   1324123  /dev/chunks/indexdbschunk
 1 active, 32766 maximum

NOTE: The values in the "page Rd" and "page Wr" columns for DBspace chunks
      are displayed in terms of system base  page size.

Expanded chunk capacity mode: always

Note: Due to space constraints, I am just including the relevant chunk information


Chunks
address          chunk/dbs  offset     page Rd  page Wr  pathname
10aedf028        12     2    0         157000     42790  /dev/chunksdatadbschunk
.
.
10aeee038	 17	3    0	 	223414 	 1324123  /dev/chunks/indexdbschunk
.
.
Chunks
address          chunk/dbs  offset     page Rd  page Wr  pathname
10aedf028        12     2    0         157103    50320  /dev/chunksdatadbschunk
.
.
.
10aeee038	 17	3    0	 	343413 	 1924131  /dev/chunks/indexdbschunk
.
.
address          chunk/dbs  offset     page Rd  page Wr  pathname
10aedf028        12     2    0         157195     51020  /dev/chunksdatadbschunk
.
.
.
10aeee038	 17	3    0	 	386616 	2242207  /dev/chunks/indexdbschunk

在输出中可以看到，尽管对 datadbs dbspace 的填充并不是非常快，但是 idxdbs 非常忙。

idxdbs 为什么会这么忙呢？现在可以运行简单的 dbschema，检查用户是否对这个表使用了索引。如果有的话，简单的解决方案是禁用索引，在全部更新之后再重新启用它们。

阅读(1092) | 评论(0) | 转发(0) |

上一篇：Informix IDS 11系统管理(918考试)认证指南,第 3 部分: 故障诊断(2)

下一篇：Informix IDS 11系统管理(918考试)认证指南,第 3 部分: 故障诊断(4)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6