Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1924091
  • 博文数量: 310
  • 博客积分: 6853
  • 博客等级: 准将
  • 技术积分: 2833
  • 用 户 组: 普通用户
  • 注册时间: 2005-08-04 16:41











分类: Oracle

2012-02-28 23:15:34

Measuring I/O wait – Part 1: OracleBy Eyal Markovich

Storage performance 101 starts with three measurements: IOPS, latency and throughput. While these measurements are important, they are secondary to the I/O wait measurement. The IOPS, latency and throughput should be measured and analyzed only after you have a clear picture of the I/O wait time.

Several weeks ago, I discussed the concept of I/O wait and why it so important to measure. I/O wait is the total duration that the working processes of your session are blocked while waiting for I/O operations to complete. When monitoring your database storage performance, you need to break the elapsed time of the application into two measurements: processing time (CPU) vs. waits, and a breakdown of the waits to different events. You accumulate the wait times of the I/O events and then assess the overall performance of your storage. Note that this method monitors not the performance of the storage itself, but rather the performance of the storage for the database application being analyzed. It can definitely be the case that your storage offers “unacceptable” services for many applications; but for this specific application, such services are sufficient.

In the next two posts, I will describe in a nutshell how I measure and calculate the information above. I will start with Oracle (in this post) and follow with SQL Server (in the coming post).

Why start with Oracle? Simply because with Oracle this calculation is very easy, and Oracle is doing most of the work for us. We use the Oracle Automatic Workload Repository (AWR) report as the main tool for the analysis. In an AWR report, one should examine theForeground Wait Class table and the Foreground Wait Events table (or Top 5 Timed Foreground Events). These tables list the class (events) and the portion of each wait out of the total database time.

The Foreground Wait Class is a good place to start. It groups the different wait events into classes and lists the different classes and the contribution to the total processing time.

Foreground Wait Class

Wait ClassWaits% Time-outsTotal Wait Time(s)Avg. Wait (ms)% DB Time
User I/O3,168,402024,497890.27
DB CPU2,6529.77
System I/O6,7760200.01

The column Total Wait Time(s) shows the accumulated time for the group, and the column % DB Time shows the contributions of the group to the total service time (database time).

Obviously, we are interested in the User I/O group, but there are other groups that are important to us as well. First, the DB CPU shows the total time and percentage of the processing (CPU). In addition, there are other groups that might be I/O related (such as Commit). Remember that each class is a group of different wait events, so in order to know if the other groups have events that are I/O related, we need to investigate the individual events.

We learned, from the table above, several important points:

  • The portion of processing (CPU) time
  • The portion of User I/O
  • The top classes that contributed to the database time

In this specific case, it is easy to see that User I/O is by far the highest contributor to the database time, at over 90%.

The next table in the AWR report that we will analyze is the Foreground Wait Events table. It shows the same data as the first table, but for the individual events.

We can see the number of waits for each event, the total and average times, the waits per transaction and the percentage of database time.

Foreground Wait Events

EventWaits% Time-outsTotal Wait Time(s)Avg. Wait (ms)Waits/Txn% DB Time
db file sequential read3,148,897024,127859.0188.91
db file scattered read13,4730317240.251.17
log file sync54,18805711.020.21
direct path read temp1,893045240.040.17
SQL*Net message to client7,487,223080140.320.03
read by other session26405200.000.02
cursor: pin S wait on X245983110.000.01
row cache lock2029070.000.01
control file sequential read6,7760200.130.01
direct path write2,5970100.050.00

As you can see above, the report doesn’t list the corresponding class next to each event. You need to know which of the events above will map to the storage (or you can use dba_hist_event_name or V$EVENT_NAME to map the events to the group).

From this table you can identify the top events and whether they are related to the storage. For example, the “db file sequential read,” which contributed 88.9% of the time, is the main event in the User I/O group we saw before.

As I mentioned earlier, there are I/O events that are not in the User I/O group. In a future post I will go over the different wait events and describe their relationships to the storage. Make sure you understand the top events and if they are I/O related.

By accumulating the total time of all I/O events (from User I/O group as well as from other groups), you get the I/O wait measurement. That time is the total time of I/O activities that caused user sessions to wait – exactly what we wanted.

Now, we have the three important measurements:

  1. DB time – total service time
  2. CPU time – total processing time
  3. I/O wait – total time user sessions waited for I/O to complete

We can easily calculate the percentage of CPU and I/O wait from the total database time.

Finally, it is worth mentioning that when reviewing an AWR report, it is good practice to start the analysis by looking at the Top 5 Timed Foreground Events table. This view is nice because the events are displayed with their class names (beside the fact that it is one of the top tables in the AWR report and therefore easily accessed). If the sum of the percentages (% DB Time) of the top five events is close to 100, then there is no need to examine the other tables. This is obviously the case with the example below.

Top 5 Timed Foreground Events

EventWaitsTime(s)Avg. Wait (ms)% DB TimeWait Class
db file sequential read3,148,89724,127888.91User I/O
DB CPU2,6529.77
db file scattered read13,473317241.17User I/O
log file sync54,1885710.21Commit
direct path read temp1,89345240.17User I/O

Consider the following AWR report:

Top 5 Timed Foreground Events

EventWaitsTime(s)Avg. Wait (ms)% DB TimeWait Class
DB CPU103,62191.07
log file sync4,026,323,729103.28Commit
db file sequential read1,306,512,412122.12User I/O
direct path read47,672896190.79User I/O
log file switch (checkpoint incomplete)8808199300.72Configuration

Both log file sync and db file sequential read show poor latencies (10 and 12 ms). But, should I even start measuring the latencies, throughput and IOPS? Probably not, since the I/O wait is a mere 6% in this case. So, if there is a problem, it is the CPU processing time.

As you can see, calculating I/O wait in Oracle is simple. This is the first thing to do when analyzing storage performance.

Tags: , , , , , 

阅读(1866) | 评论(0) | 转发(0) |