从事IT基础架构多年,发现自己原来更合适去当老师……喜欢关注新鲜事物,不仅限于IT领域。
分类: Oracle
2007-02-05 15:08:44
Oracle's cost-based SQL optimizer (CBO) is an extremely sophisticated component of Oracle that governs the execution for every Oracle query. The CBO has evolved into one of the world's most sophisticated software components, and it has the challenging job of evaluating any SQL statement and generating the "best" execution plan for the statement.
Because the CBO determines the execution speed for every Oracle query, the Oracle professional must understand how the CBO is influenced by Oracle external issues, internal statistics, and data distribution.
In this first installment of a two-part article, we will cover the following CBO topics:
So let's start by examining the CBO optimizer modes and the Oracle parameters that influence the CBO.
CBO Parameters
The CBO is influenced by many configuration settings. Your settings for important CBO parameters can have a dramatic impact of CBO performance, and this is the place to start when exploring the CBO. Let's starting by choosing your CBO optimizer_mode and then examine other important CBO parameters.
The CBO and optimizer modes. In Oracle9i Database there are four optimizer modes, all determined by the value of the optimizer_mode parameter: rule, choose, all_rows, and first_rows. The rule and choose modes reflect the obsolete rule-based optimizer, so we will focus on the CBO modes here.
The optimizer mode can be set at the system-wide level, for an individual session, or for a specific SQL statement:
alter system set optimizer_mode=first_rows_10; alter session set optimizer_goal = all_rows; select /*+ first_rows(100) */ from student;
We need to start by defining what is the "best" execution plan for a SQL statement. Is the best execution plan the one that begins to return rows the fastest, or is the best execution plan the one that executes with the smallest amount of computing resources? Of course, the answer depends on the processing needs of your database.
Let's take a simple example. Assume the following query:
select customer_name from customer where region = 'south' order by customer_name;
If the best execution plan is the one that starts to return rows the fastest, a concatenated index on region and customer_name could be used to immediately start delivering table rows in their proper order, even though excess I/O will be required to read the nonadjacent data blocks (see ).
Let's assume that this execution plan starts delivering results in .0001 seconds and requires 10,000 db_block_gets. But what if your goal is to minimize computing resources? If this SQL is inside a batch program, then it is not important to start returning rows quickly, and a different execution plan would take fewer resources. In this example, a parallel full-table scan followed by a back-end sort will require less machine resources and less I/O because blocks do not have to be reread to pull the data in sorted order (see ). In this example, we expect the result to take longer to deliver (no rows until the sort is complete), but we will see far less I/O because blocks will not have to be reaccessed to deliver the rows in presorted order. Let's assume that this execution plan delivers the result in 10 seconds with 5,000 db_block_gets.
Oracle offers several optimizer modes that allow you to choose your definition of the "best" execution plan for you:
While the optimizer_mode is the single most important factor in invoking the cost-based optimizer, there are other parameters that influence the CBO behavior. Let's take a quick look at these parameters.
Oracle parameters that influence the CBO. While the optimizer_mode parameter governs the global behavior of the CBO, there are many other Oracle parameters that have a great impact on CBO behavior. Because of the power of the CBO, Oracle provides several system-level parameters that can adjust the overall behavior of the CBO. These adjustment parameters generally involve the choice of using an index versus doing a full-table scan, and the CBO's choice of table join methods.
However, Oracle does not recommend changing the default values for many of these CBO setting because the changes can affect the execution plans for thousands of SQL statements. Here are the major optimizer parameters:
The idea optimizer settings depend on your environment and are heavily influenced by your system's costs for scattered disk reads versus sequential disk reads. contains a great script you can use to measure these I/O costs on your database.
Now that we understand the CBO parameters, let's look at how we can help the CBO make good execution-plan decisions by providing the CBO with information about our schema.
CBO Statistics
The most important key to success with the CBO is to carefully define and manage your statistics. In order for the CBO to make an intelligent decision about the best execution plan for your SQL, it must have information about the table and indexes that participate in the query. When the CBO knows the size of the tables and the distribution, cardinality, and selectivity of column values, the CBO can make an informed decision and almost always generates the best execution plan.
Let's examine the following areas of CBO statistics and see how to gather top-quality statistics for the CBO and how to create an appropriate CBO environment for your database.
Getting top-quality statistics for the CBO. The choices of executions plans made by the CBO are only as good as the statistics available to it. The old-fashioned analyze table and dbms_utility methods for generating CBO statistics are obsolete and somewhat dangerous to SQL performance. As we may know, the CBO uses object statistics to choose the best execution plan for all SQL statements.
The dbms_stats utility does a far better job in estimating statistics, especially for large partitioned tables, and the better statistics result in faster SQL execution plans. Here is a sample execution of dbms_stats with the OPTIONS clause:
exec dbms_stats.gather_schema_stats( - ownname => 'SCOTT', - options => 'GATHER AUTO', - estimate_percent => dbms_stats.auto_sample_size, - method_opt => 'for all columns size repeat', - degree => 34 - )
There are several values for the OPTIONS parameter that we need to know about:
SQL> desc dba_tab_modifications; Name Type -------------------------------- TABLE_OWNER VARCHAR2(30) TABLE_NAME VARCHAR2(30) PARTITION_NAME VARCHAR2(30) SUBPARTITION_NAME VARCHAR2(30) INSERTS NUMBER UPDATES NUMBER DELETES NUMBER TIMESTAMP DATE TRUNCATED VARCHAR2(3)
The most interesting of these options is the GATHER STALE option. Because all statistics will become stale quickly in a robust OLTP database, we must remember the rule for GATHER STALE is > 10% row change (based on num_rows at statistics collection time). Hence, almost every table except read-only tables will be reanalyzed with the GATHER STALE option, making the GATHER STALE option best for systems that are largely read-only. For example, if only five percent of the database tables get significant updates, then only five percent of the tables will be reanalyzed with the GATHER STALE option.
Automating sample size with dbms_stats.The better the quality of the statistics, the better the job that the CBO will do when determining your execution plans. Unfortunately, doing a complete analysis on a large database could take days, and most shops must sample your database to get CBO statistics. The goal is to take a large enough sample of the database to provide top-quality data for the CBO.
Now that we see how the dbms_stats option works, let's see how to specify an adequate sample size for dbms_stats.
In earlier releases, the DBA had to guess what percentage of the database provided the best sample size and sometimes underanalyzed the schema. Starting with Oracle9i Database, the estimate_percent argument is a great way to allow Oracle's dbms_stats to automatically estimate the "best" percentage of a segment to sample when gathering statistics:
estimate_percent => dbms_stats.auto_sample_size
After collecting automatic sample sizes, you can verify the accuracy of the automatic statistics sampling by looking at the sample_size column on any of these data dictionary views:
Now that we understand the value of CBO statistics, let's look at ways that the CBO statistics are managed in a successful Oracle shop.
col c1 heading 'Average Waits|forFull| Scan Read I/O' format 9999.999 col c2 heading 'Average Waits|for Index|Read I/O' format 9999.999 col c3 heading 'Percent of| I/O Waits|for Full Scans' format 9.99 col c4 heading 'Percent of| I/O Waits|for Index Scans' format 9.99 col c5 heading 'Starting|Value|for|optimizer|index|cost|adj' format 999 select a.average_wait c1, b.average_wait c2, a.total_waits /(a.total_waits + b.total_waits) c3, b.total_waits /(a.total_waits + b.total_waits) c4, (b.average_wait / a.average_wait)*100 c5 from v$system_event a, v$system_event b where a.event = 'db file scattered read' and b.event = 'db file sequential read' ;