Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1428305
  • 博文数量: 556
  • 博客积分: 12626
  • 博客等级: 上将
  • 技术积分: 5799
  • 用 户 组: 普通用户
  • 注册时间: 2006-01-11 15:56
个人简介

从事IT基础架构多年,发现自己原来更合适去当老师……喜欢关注新鲜事物,不仅限于IT领域。

文章分类

全部博文(556)

文章存档

2019年(6)

2018年(15)

2017年(17)

2016年(11)

2015年(2)

2014年(2)

2013年(36)

2012年(54)

2011年(100)

2010年(41)

2009年(72)

2008年(14)

2007年(82)

2006年(104)

分类: Oracle

2007-02-05 15:08:44

转发ORACLE技术网的两篇关于CBO的文章(第一部分只找到英文),虽然作者写的比较早,不过文章还是不错的。对于优化器ORACLE的联机文档上有比较详细的说明,这篇文章可以当作一个很好的补充。当然文章主要是围绕ORACLE 9I的,10G的优化器又有一定的增强。
 

Oracle's cost-based SQL optimizer (CBO) is an extremely sophisticated component of Oracle that governs the execution for every Oracle query. The CBO has evolved into one of the world's most sophisticated software components, and it has the challenging job of evaluating any SQL statement and generating the "best" execution plan for the statement.

Because the CBO determines the execution speed for every Oracle query, the Oracle professional must understand how the CBO is influenced by Oracle external issues, internal statistics, and data distribution.

In this first installment of a two-part article, we will cover the following CBO topics:

  • CBO parameters. We will start by reviewing the basic optimizer modes within the CBO and then drill down and examine specific parameters that influence the behavior of the CBO.

  • CBO statistics. We will examine the importance of gathering proper CBO statistics with dbms_stats and review techniques for ensuring that execution plans remain stable. We will also look at techniques for migrating statistics between systems and examine how developers can optimize their SQL in a test environment and confidently migrate SQL into production without fear of changing execution plans.
Part 2 will explains the use of histograms, external costing features, SQL hints for changing execution plans, and techniques for locating and tuning suboptimal SQL.

So let's start by examining the CBO optimizer modes and the Oracle parameters that influence the CBO.

CBO Parameters

The CBO is influenced by many configuration settings. Your settings for important CBO parameters can have a dramatic impact of CBO performance, and this is the place to start when exploring the CBO. Let's starting by choosing your CBO optimizer_mode and then examine other important CBO parameters.

The CBO and optimizer modes. In Oracle9i Database there are four optimizer modes, all determined by the value of the optimizer_mode parameter: rule, choose, all_rows, and first_rows. The rule and choose modes reflect the obsolete rule-based optimizer, so we will focus on the CBO modes here.

The optimizer mode can be set at the system-wide level, for an individual session, or for a specific SQL statement:

alter system set optimizer_mode=first_rows_10;
alter session set optimizer_goal = all_rows; 
select /*+ first_rows(100) */ from student;

We need to start by defining what is the "best" execution plan for a SQL statement. Is the best execution plan the one that begins to return rows the fastest, or is the best execution plan the one that executes with the smallest amount of computing resources? Of course, the answer depends on the processing needs of your database.

Let's take a simple example. Assume the following query:

select customer_name
from
   customer
where

   region = 'south'
order by
   customer_name;

If the best execution plan is the one that starts to return rows the fastest, a concatenated index on region and customer_name could be used to immediately start delivering table rows in their proper order, even though excess I/O will be required to read the nonadjacent data blocks (see ).

table rows in proper order

Let's assume that this execution plan starts delivering results in .0001 seconds and requires 10,000 db_block_gets. But what if your goal is to minimize computing resources? If this SQL is inside a batch program, then it is not important to start returning rows quickly, and a different execution plan would take fewer resources. In this example, a parallel full-table scan followed by a back-end sort will require less machine resources and less I/O because blocks do not have to be reread to pull the data in sorted order (see ). In this example, we expect the result to take longer to deliver (no rows until the sort is complete), but we will see far less I/O because blocks will not have to be reaccessed to deliver the rows in presorted order. Let's assume that this execution plan delivers the result in 10 seconds with 5,000 db_block_gets.

parallel full-table scan

Oracle offers several optimizer modes that allow you to choose your definition of the "best" execution plan for you:

  • optimizer_mode=first_rows_ This CBO mode will return rows as soon as possible, even if the overall query runs longer or consumes more computing resources than other plans. The first_rows optimizer_mode usually involves choosing an index scan over a full-table scan because index access will return rows quickly. Because the first_rows mode favors index scans over full-table scans, the first_rows mode is more appropriate for OLTP systems where the end user needs to see small result sets as quickly as possible.

  • optimizer_mode=all_rows_ This CBO mode ensures that the overall computing resources are minimized, even if no rows are available until the entire query has completed. The all_rows access method often favors a parallel full-table scan over a full-index scan, and sorting over presorted retrieval via an index. Because the all_rows mode favors full-table scans, it is best suited for data warehouses, decision-support systems, and batch-oriented databases where intermediate rows are not required for real-time viewing.

  • optimizer_mode=first_rows_n This Oracle9i Database optimizer mode enhancement optimizes queries for a small, expected return set. The values are first_rows_1, first_rows_10, first_rows_100, and first_rows_1000. The CBO uses the n in first_rows_n as an important driver in determining cardinalities for query result sets. By telling the CBO, a priori, that we only expect a certain number of rows back from the query, the CBO will be able to make a better decision about whether to use an index to access the table rows.

  • Optimizer_mode=rule The rule-based optimizer (RBO) is the archaic optimizer mode from the earliest releases of Oracle Database. The rule-based optimizer has not been updated in nearly a decade and is not recommended for production use because the RBO does not support any new features of Oracle since 1994 (such as bitmap indexes, table partitions, and function-based indexes).

While the optimizer_mode is the single most important factor in invoking the cost-based optimizer, there are other parameters that influence the CBO behavior. Let's take a quick look at these parameters.

Oracle parameters that influence the CBO. While the optimizer_mode parameter governs the global behavior of the CBO, there are many other Oracle parameters that have a great impact on CBO behavior. Because of the power of the CBO, Oracle provides several system-level parameters that can adjust the overall behavior of the CBO. These adjustment parameters generally involve the choice of using an index versus doing a full-table scan, and the CBO's choice of table join methods.

However, Oracle does not recommend changing the default values for many of these CBO setting because the changes can affect the execution plans for thousands of SQL statements. Here are the major optimizer parameters:

  • optimizer_index_cost_adj_ This parameter alters the costing algorithm for access paths involving indexes. The smaller the value, the lower the cost of index access.

  • optimizer_index_caching_ This parameter tells Oracle how much of your index is likely to be in the RAM data buffer cache. The setting for optimizer_index_caching affects the CBO's decision to use an index for a table join (nested loops) or to favor a full-table scan.

  • db_file_multiblock_read_count_ When this parameter is set to a high value, the CBO recognizes that scattered (multiblock) reads may be less expensive than sequential reads. This makes the CBO friendlier to full-table scans.

  • parallel_automatic_tuning_ When set to "on", this parameter parallelizes full-table scans . Because parallel full-table scans are very fast, the CBO will give a higher cost to index access and be friendlier to full-table scans.

  • hash_area_size (if not using pga_aggregate_target) _ The setting for hash_area_size parameter governs the propensity of the CBO to favor hash joins over nested loop and sort merge table joins.

  • sort_area_size (if not using pga_aggregate_target) _ The sort_area_size influences the CBO when deciding whether to perform an index access or a sort of the result set. The higher the value for sort_area_size, the more likely that a sort will be performed in RAM, and the more likely that the CBO will favor a sort over presorted index retrieval.

The idea optimizer settings depend on your environment and are heavily influenced by your system's costs for scattered disk reads versus sequential disk reads. contains a great script you can use to measure these I/O costs on your database.

Now that we understand the CBO parameters, let's look at how we can help the CBO make good execution-plan decisions by providing the CBO with information about our schema.

CBO Statistics

The most important key to success with the CBO is to carefully define and manage your statistics. In order for the CBO to make an intelligent decision about the best execution plan for your SQL, it must have information about the table and indexes that participate in the query. When the CBO knows the size of the tables and the distribution, cardinality, and selectivity of column values, the CBO can make an informed decision and almost always generates the best execution plan.

Let's examine the following areas of CBO statistics and see how to gather top-quality statistics for the CBO and how to create an appropriate CBO environment for your database.

Getting top-quality statistics for the CBO. The choices of executions plans made by the CBO are only as good as the statistics available to it. The old-fashioned analyze table and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance. As we may know, the CBO uses object statistics to choose the best execution plan for all SQL statements.

The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better statistics result in faster SQL execution plans. Here is a sample execution of dbms_stats with the OPTIONS clause:

exec dbms_stats.gather_schema_stats( - 
  ownname          => 'SCOTT', - 
  options          => 'GATHER AUTO', - 
  estimate_percent => dbms_stats.auto_sample_size, - 
  method_opt       => 'for all columns size repeat', - 
  degree           => 34 - 
   )

There are several values for the OPTIONS parameter that we need to know about:

  • GATHER_ reanalyzes the whole schema

  • GATHER EMPTY_ only analyzes tables that have no existing statistics

  • GATHER STALE_ only reanalyzes tables with more than 10 percent modifications (inserts, updates, deletes)

  • GATHER AUTO_ will reanalyze objects that currently have no statistics and objects with stale statistics. Using GATHER AUTO is like combining GATHER STALE and GATHER EMPTY.
Note that both GATHER STALE and GATHER AUTO require monitoring. If you issue the ALTER TABLE XXX MONITORING command, Oracle tracks changed tables with the dba_tab_modifications view. Below we see that the exact number of inserts, updates and deletes are tracked since the last analysis of statistics:

SQL> desc dba_tab_modifications;

 Name                Type
 --------------------------------
 TABLE_OWNER         VARCHAR2(30)
 TABLE_NAME          VARCHAR2(30)
 PARTITION_NAME      VARCHAR2(30)
 SUBPARTITION_NAME   VARCHAR2(30)
 INSERTS             NUMBER
 UPDATES             NUMBER
 DELETES             NUMBER
 TIMESTAMP           DATE

 TRUNCATED           VARCHAR2(3)

The most interesting of these options is the GATHER STALE option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for GATHER STALE is > 10% row change (based on num_rows at statistics collection time). Hence, almost every table except read-only tables will be reanalyzed with the GATHER STALE option, making the GATHER STALE option best for systems that are largely read-only. For example, if only five percent of the database tables get significant updates, then only five percent of the tables will be reanalyzed with the GATHER STALE option.

Automating sample size with dbms_stats.The better the quality of the statistics, the better the job that the CBO will do when determining your execution plans. Unfortunately, doing a complete analysis on a large database could take days, and most shops must sample your database to get CBO statistics. The goal is to take a large enough sample of the database to provide top-quality data for the CBO.

Now that we see how the dbms_stats option works, let's see how to specify an adequate sample size for dbms_stats.

In earlier releases, the DBA had to guess what percentage of the database provided the best sample size and sometimes underanalyzed the schema. Starting with Oracle9i Database, the estimate_percent argument is a great way to allow Oracle's dbms_stats to automatically estimate the "best" percentage of a segment to sample when gathering statistics:

estimate_percent => dbms_stats.auto_sample_size

After collecting automatic sample sizes, you can verify the accuracy of the automatic statistics sampling by looking at the sample_size column on any of these data dictionary views:

  • DBA_ALL_TABLES
  • DBA_INDEXES
  • DBA_IND_PARTITIONS
  • DBA_IND_SUBPARTITIONS
  • DBA_OBJECT_TABLES
  • DBA_PART_COL_STATISTICS
  • DBA_SUBPART_COL_STATISTICS
  • DBA_TABLES
  • DBA_TAB_COLS
  • DBA_TAB_COLUMNS
  • DBA_TAB_COL_STATISTICS
  • DBA_TAB_PARTITIONS
  • DBA_TAB_SUBPARTITIONS
Note that Oracle generally chooses a sample_size from 5 to 20 percent when using automatic sampling, depending on the size of the tables and the distribution of column values. Remember, the better the quality of your statistics, the better the decision of the CBO.

Now that we understand the value of CBO statistics, let's look at ways that the CBO statistics are managed in a successful Oracle shop.

 
LISTING 1: optimizer_index_cost_adj.sql

col c1 heading 'Average Waits|forFull| Scan Read I/O'        format 9999.999
col c2 heading 'Average Waits|for Index|Read I/O'            format 9999.999
col c3 heading 'Percent of| I/O Waits|for Full Scans'        format 9.99
col c4 heading 'Percent of| I/O Waits|for Index Scans'       format 9.99
col c5 heading 'Starting|Value|for|optimizer|index|cost|adj' format 999

select
   a.average_wait                                  c1,
   b.average_wait                                  c2,
   a.total_waits /(a.total_waits + b.total_waits)  c3,
   b.total_waits /(a.total_waits + b.total_waits)  c4,
   (b.average_wait / a.average_wait)*100           c5
from
   v$system_event  a,
   v$system_event  b
where
   a.event = 'db file scattered read'
and
   b.event = 'db file sequential read'
;
阅读(1964) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~