Measuring I/O wait – Part 1: Oracle By Eyal Markovich-gflei-ChinaUnix博客

学而时习之gflei.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

gflei

博客访问： 2364129
博文数量： 310
博客积分： 6853
博客等级：准将
技术积分： 2833
用户组：普通用户
注册时间： 2005-08-04 16:41

文章分类

全部博文（310）

net（7）
工具（4）
Windows（26）

windows阀值（5）

硬件配置（3）

批处理（8）
UNIX&LINUX（52）

问题排查（8）

Security（1）

LINUX（31）

AIX（5）

HP-UX 学习（2）
JAVA（2）

RMI（1）
邮件服务器（1）
Mysql（20）

sql_reference（7）

优化（2）

数据库管理（6）

JDBC（2）

用户管理（1）
SQLSERVER（62）

性能（12）

SQL-REFERENCE（8）

maintain（15）

Concept（4）

lock（15）
Oracle（108）

SQL_REFERENCE（4）

Wait（3）

基础概念（3）

安全（11）

Client（5）

日常维护（7）

常见错误（27）

性能（19）

内存（9）

存储（11）

Lock（3）
LoadRunner（4）
应用服务器（4）

TOMCAT（2）

WEBSPHERE（1）
Apache（17）

虚拟主机（5）

与tomcat集成（5）

FAQ（0）
未分配的博文（3）

文章存档

2013年（17）

2012年（42）

2011年（76）

2010年（71）

2009年（99）

2007年（2）

2006年（1）

2005年（2）

我的朋友

相关博文

Measuring I/O wait – Part 1: Oracle By Eyal Markovich

分类： Oracle

2012-02-28 23:15:34

Measuring I/O wait – Part 1: OracleBy Eyal Markovich

Storage performance 101 starts with three measurements: IOPS, latency and throughput. While these measurements are important, they are secondary to the I/O wait measurement. The IOPS, latency and throughput should be measured and analyzed only after you have a clear picture of the I/O wait time.

Several weeks ago, I discussed the concept of I/O wait and why it so important to measure. I/O wait is the total duration that the working processes of your session are blocked while waiting for I/O operations to complete. When monitoring your database storage performance, you need to break the elapsed time of the application into two measurements: processing time (CPU) vs. waits, and a breakdown of the waits to different events. You accumulate the wait times of the I/O events and then assess the overall performance of your storage. Note that this method monitors not the performance of the storage itself, but rather the performance of the storage for the database application being analyzed. It can definitely be the case that your storage offers “unacceptable” services for many applications; but for this specific application, such services are sufficient.

In the next two posts, I will describe in a nutshell how I measure and calculate the information above. I will start with Oracle (in this post) and follow with SQL Server (in the coming post).

Why start with Oracle? Simply because with Oracle this calculation is very easy, and Oracle is doing most of the work for us. We use the Oracle Automatic Workload Repository (AWR) report as the main tool for the analysis. In an AWR report, one should examine theForeground Wait Class table and the Foreground Wait Events table (or Top 5 Timed Foreground Events). These tables list the class (events) and the portion of each wait out of the total database time.

The Foreground Wait Class is a good place to start. It groups the different wait events into classes and lists the different classes and the contribution to the total processing time.

Foreground Wait Class

Wait Class	Waits	% Time-outs	Total Wait Time(s)	Avg. Wait (ms)	% DB Time
User I/O	3,168,402	0	24,497	8	90.27
DB CPU			2,652		9.77
Commit	54,188	0	57	1	0.21
Network	7,533,123	0	9	0	0.03
Concurrency	1,258	70	6	4	0.02
System I/O	6,776	0	2	0	0.01
Configuration	11	9	1	64	0.00

The column Total Wait Time(s) shows the accumulated time for the group, and the column % DB Time shows the contributions of the group to the total service time (database time).

Obviously, we are interested in the User I/O group, but there are other groups that are important to us as well. First, the DB CPU shows the total time and percentage of the processing (CPU). In addition, there are other groups that might be I/O related (such as Commit). Remember that each class is a group of different wait events, so in order to know if the other groups have events that are I/O related, we need to investigate the individual events.

We learned, from the table above, several important points:

The portion of processing (CPU) time
The portion of User I/O
The top classes that contributed to the database time

In this specific case, it is easy to see that User I/O is by far the highest contributor to the database time, at over 90%.

The next table in the AWR report that we will analyze is the Foreground Wait Events table. It shows the same data as the first table, but for the individual events.

We can see the number of waits for each event, the total and average times, the waits per transaction and the percentage of database time.

Foreground Wait Events

Event	Waits	% Time-outs	Total Wait Time(s)	Avg. Wait (ms)	Waits/Txn	% DB Time
db file sequential read	3,148,897	0	24,127	8	59.01	88.91
db file scattered read	13,473	0	317	24	0.25	1.17
log file sync	54,188	0	57	1	1.02	0.21
direct path read temp	1,893	0	45	24	0.04	0.17
SQL*Net message to client	7,487,223	0	8	0	140.32	0.03
read by other session	264	0	5	20	0.00	0.02
cursor: pin S wait on X	245	98	3	11	0.00	0.01
row cache lock	2	0	2	907	0.00	0.01
control file sequential read	6,776	0	2	0	0.13	0.01
direct path write	2,597	0	1	0	0.05	0.00

As you can see above, the report doesn’t list the corresponding class next to each event. You need to know which of the events above will map to the storage (or you can use dba_hist_event_name or V$EVENT_NAME to map the events to the group).

From this table you can identify the top events and whether they are related to the storage. For example, the “db file sequential read,” which contributed 88.9% of the time, is the main event in the User I/O group we saw before.

As I mentioned earlier, there are I/O events that are not in the User I/O group. In a future post I will go over the different wait events and describe their relationships to the storage. Make sure you understand the top events and if they are I/O related.

By accumulating the total time of all I/O events (from User I/O group as well as from other groups), you get the I/O wait measurement. That time is the total time of I/O activities that caused user sessions to wait – exactly what we wanted.

Now, we have the three important measurements:

DB time – total service time
CPU time – total processing time
I/O wait – total time user sessions waited for I/O to complete

We can easily calculate the percentage of CPU and I/O wait from the total database time.

Finally, it is worth mentioning that when reviewing an AWR report, it is good practice to start the analysis by looking at the Top 5 Timed Foreground Events table. This view is nice because the events are displayed with their class names (beside the fact that it is one of the top tables in the AWR report and therefore easily accessed). If the sum of the percentages (% DB Time) of the top five events is close to 100, then there is no need to examine the other tables. This is obviously the case with the example below.

Top 5 Timed Foreground Events

Event	Waits	Time(s)	Avg. Wait (ms)	% DB Time	Wait Class
db file sequential read	3,148,897	24,127	8	88.91	User I/O
DB CPU		2,652		9.77
db file scattered read	13,473	317	24	1.17	User I/O
log file sync	54,188	57	1	0.21	Commit
direct path read temp	1,893	45	24	0.17	User I/O

Consider the following AWR report:

Top 5 Timed Foreground Events

Event	Waits	Time(s)	Avg. Wait (ms)	% DB Time	Wait Class
DB CPU		103,621		91.07
log file sync	4,026,32	3,729	10	3.28	Commit
db file sequential read	1,306,51	2,412	12	2.12	User I/O
direct path read	47,672	896	19	0.79	User I/O
log file switch (checkpoint incomplete)	880	819	930	0.72	Configuration

Both log file sync and db file sequential read show poor latencies (10 and 12 ms). But, should I even start measuring the latencies, throughput and IOPS? Probably not, since the I/O wait is a mere 6% in this case. So, if there is a problem, it is the CPU processing time.

As you can see, calculating I/O wait in Oracle is simple. This is the first thing to do when analyzing storage performance.

Tags: , , , , ,

阅读(2401) | 评论(0) | 转发(0) |

上一篇：Invalid DBSNMP password的可能原因

下一篇：cpu wa引起的进程阻塞现象

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6