WINDOWS下的程序员出身,偶尔也写一些linux平台下小程序, 后转行数据库行业,专注于ORACLE和DB2的运维和优化。 同时也是ios移动开发者。欢迎志同道合的朋友一起研究技术。 数据库技术交流群:58308065,23618606
全部博文(599)
分类: Oracle
2010-07-27 17:49:55
很多时候,我们可能希望对某个列设置默认值来避免NULL值的存在,但是如果设置不当的话,将会带来性能的隐患。
SQL> conn ysp/ysp
Connected.
SQL> drop table test purge;
Table dropped.
SQL> CREATE TABLE TEST(ID INT,DATE_COL DATE DEFAULT '4000-01-01');
Table created.
SQL> INSERT INTO TEST SELECT ROWNUM,TO_DATE('2000-01-01','YYYY-MM-DD')+ROWNUM FROM ALL_OBJECTS WHERE ROWNUM<=3650;
3650 rows created.
SQL> COMMIT;
Commit complete.
SQL> CREATE INDEX TEST_IDX_01 ON TEST(DATE_COL);
Index created.
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'TEST',method_opt=>null,cascade=>true);
PL/SQL procedure successfully completed.
SQL> select min(date_col),max(date_col) from test;
MIN(DATE_COL) MAX(DATE_COL)
------------------- -------------------
2000-01-02 00:00:00 2009-12-29 00:00:00
SQL> set autot traceonly exp
SQL> select * from test where date_col between to_date('2002-01-01','yyyy-mm-dd') and to_date('2009-01-01','yyyy-mm-dd');
2558 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2560 | 28160 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 2560 | 28160 | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DATE_COL">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "DATE_COL"<=TO_DATE(' 2009-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
186 consistent gets
0 physical reads
0 redo size
65527 bytes sent via SQL*Net to client
2362 bytes received via SQL*Net from client
172 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2558 rows processed
我们选择了表中的大约80%的记录,此时ORACLE选择全表扫描是正确的,而且估算出的记录数(2560)和实际很接近(2560)。
下面我们看看插入默认值的情况。
SQL> insert into test(id) values(3651);
1 row created.
SQL> commit;
Commit complete.
SQL> select min(date_col),max(date_col) from test;
MIN(DATE_COL) MAX(DATE_COL)
------------------- -------------------
2000-01-02 00:00:00 4000-01-01 00:00:00
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'TEST',method_opt=>null,cascade=>true);
PL/SQL procedure successfully completed.
SQL> select * from test where date_col between to_date('2002-01-01','yyyy-mm-dd') and to_date('2009-01-01','yyyy-mm-dd');
2558 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2214617613
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 165 | 3 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST | 15 | 165 | 3 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | TEST_IDX_01 | 15 | | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("DATE_COL">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "DATE_COL"<=TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
186 consistent gets
0 physical reads
0 redo size
65527 bytes sent via SQL*Net to client
2362 bytes received via SQL*Net from client
172 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2558 rows processed
可以看到,插入默认之后,执行同样的查询ORACLE却选择了索引扫描,而且估算出的基数15和实际相差太多。
之所以造成这样的结果是因为默认值4000-01-01的原因。ORACLE认为我们要从2002-01-01到4000-01-01 选取2002-01-01到2009-01-01的记录。
所以选择率比较低,从而ORACLE走了索引扫描。
此时的选择率是:
(TO_DATE(' 2009-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')-TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))/
(TO_DATE(' 4000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')-TO_DATE(' 2000-01-02 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
+2/3651=0.004048214
选择基数为:3651*0.004048214=15
其实从2010年到4000年基本没有数据,但是ORACLE优化器并不知道,因此,我们需要告诉优化器我们的这种槽糕的数据间隔。
一种方法是使用直方图。
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(USER,'TEST',method_opt=>'for columns date_col size 254',cascade=>true);
PL/SQL procedure successfully completed.
SQL> set autot traceonly exp
SQL> select * from test where date_col between to_date('2002-01-01','yyyy-mm-dd') and to_date('2009-01-01','yyyy-mm-dd');
2558 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 217508114
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2578 | 28358 | 4 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 2578 | 28358 | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("DATE_COL">=TO_DATE(' 2002-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "DATE_COL"<=TO_DATE(' 2009-01-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
186 consistent gets
0 physical reads
0 redo size
65527 bytes sent via SQL*Net to client
2362 bytes received via SQL*Net from client
172 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2558 rows processed
至此ORACLE执行计划得以修正,而且估算出的基数也比较接近。