IN&EXISTS与NOT IN&NOT EXISTS 的优化原则的讨论-hb_li

华山虎

首页　| 　博文目录　| 　关于我

hb_li_520

博客访问： 450433
博文数量： 135
博客积分： 4177
博客等级：上校
技术积分： 1145
用户组：普通用户
注册时间： 2005-12-13 17:16

文章分类

全部博文（135）

业余爱好（6）
ORACLE（35）
NS2（4）
思科与华为（5）
C/C++（10）
Apache（1）
DEBIAN（11）
LINUX（21）

OS Normal（4）

Network（2）

fedora（4）

DNS（0）

iptables（3）
SHELL（10）
weblogic（2）
HP-UN（2）
存储（6）
MYSQL（22）

系统（1）

遇错解决（3）

常用工具（0）

安装与配置（2）

高可用性（0）

备份与恢愎（2）

集群（2）

同步（2）

SQL语句与技巧（3）

性能优化（7）
未分配的博文（0）

文章存档

2011年（5）

2010年（4）

2009年（26）

2008年（25）

2007年（29）

2006年（42）

2005年（4）

我的朋友

相关博文

IN&EXISTS与NOT IN&NOT EXISTS 的优化原则的讨论

分类： Oracle

2007-03-04 23:26:12

1.
EXISTS的执行流程
select * from t1 where exists ( select null from t2 where y = x )
可以理解为:
for x in ( select * from t1 )
loop
   if ( exists ( select null from t2 where y = x.x )
   then
      OUTPUT THE RECORD
   end if
end loop
对于in 和 exists的性能区别:
如果子查询得出的结果集记录较少，主查询中的表较大且又有索引时应该用in,反之如果外层的主查询记录较少，子查询中的表大，又有索引时使用exists。
其实我们区分in和exists主要是造成了驱动顺序的改变（这是性能变化的关键），如果是exists，那么以外层表为驱动表，先被访问，如果是IN，那么先执行子查询，所以我们会以驱动表的快速返回为目标，那么就会考虑到索引及结果集的关系了

另外IN时不对NULL进行处理
如：
select 1 from dual where null  in (0,1,2,null)
为空

2.NOT IN 与NOT EXISTS:
NOT EXISTS的执行流程
select .....
  from rollup R
where not exists ( select 'Found' from title T
                           where R.source_id = T.Title_ID);
可以理解为:
for x in ( select * from rollup )
   loop
      if ( not exists ( that query ) ) then
               OUTPUT
      end if;
   end;

注意:NOT EXISTS 与 NOT IN 不能完全互相替换，看具体的需求。如果选择的列可以为空，则不能被替换。

例如下面语句，看他们的区别：
select x,y from t;
x             y
------       ------
1             3
3 1
1 2
1 1
3 1
5
select * from t where  x not in (select y from t t2  )
no rows

select * from t where  not exists (select null from t t2
                                                where t2.y=t.x )
x    y
------  ------
5    NULL
所以要具体需求来决定

对于not in 和 not exists的性能区别：
not in 只有当子查询中，select 关键字后的字段有not null约束或者有这种暗示时用not in,另外如果主查询中表大，子查询中的表小但是记录多，则应当使用not in,并使用anti hash join.
如果主查询表中记录少，子查询表中记录多，并有索引，可以使用not exists,另外not in最好也可以用/*+ HASH_AJ */或者外连接+is null
NOT IN 在基于成本的应用中较好

比如:
select .....
from rollup R
where not exists ( select 'Found' from title T
                        where R.source_id = T.Title_ID);

改成（佳）

select ......
from title T, rollup R
where R.source_id = T.Title_id(+)
and T.Title_id is null;

或者（佳）
sql> select /*+ HASH_AJ */ ...
      from rollup R
      where ource_id NOT IN ( select ource_id
                                             from title T
                                          where ource_id IS NOT NULL )

建两个表，outer大表，外层表/inner小表，内层表

SQL> create table inner as select rownum as id from dba_tables;

Table created.

SQL> create table outer as select rownum*2 as id from dba_objects;

Table created.

SQL> select count(*) from inner;

COUNT(*)
----------
531

SQL> select count(*) from outer;

COUNT(*)
----------
27774

SQL> set timing on
SQL> set autotrace traceonly explain

SQL> select * from outer where id in (select id from inner);

265 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 TABLE ACCESS (FULL) OF 'OUTER'
4 1 SORT (JOIN)
5 4 VIEW OF 'VW_NSO_1'
6 5 SORT (UNIQUE)
7 6 TABLE ACCESS (FULL) OF 'INNER'

SQL> select * from outer where exists (select 'x' from inner where inner.id=outer.id);

265 rows selected.

Elapsed: 00:00:08.02

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'OUTER'
3 1 TABLE ACCESS (FULL) OF 'INNER'

没有对表分析之前:从上面看出(select id from inner)返回值比较少的话，用in比较快的；
第一句相当于select * from outer,(select distinct id from inner) where inner.id=outer.id

你可以试试，如果(select id from inner)返回值比(select id from outer)多的话，用exists快。

对表进行分析后，看看效果。使用了hash jion,两种方式效果相同了
SQL> analyze table inner compute statistics;

Table analyzed.

Elapsed: 00:00:00.06
SQL> analyze table outer compute statistics;

Table analyzed.

Elapsed: 00:00:00.08

SQL> select * from outer where id in (select id from inner);

265 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=531 Bytes=3717)
1 0 HASH JOIN (Cost=11 Card=531 Bytes=3717)
2 1 SORT (UNIQUE)
3 2 TABLE ACCESS (FULL) OF 'INNER' (Cost=2 Card=531 Bytes=1593)
4 1 TABLE ACCESS (FULL) OF 'OUTER' (Cost=6 Card=27774 Bytes=111096)

SQL> select * from outer where exists (select 'x' from inner where inner.id=outer.id);

265 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=531 Bytes=3717)
1 0 HASH JOIN (Cost=11 Card=531 Bytes=3717)
2 1 SORT (UNIQUE)
3 2 TABLE ACCESS (FULL) OF 'INNER' (Cost=2 Card=531 Bytes=1593)
4 1 TABLE ACCESS (FULL) OF 'OUTER' (Cost=6 Card=27774 Bytes=111096)

阅读(1657) | 评论(0) | 转发(0) |

上一篇：数据库冷备份脚本(windows 2K)

下一篇：Debian下安装配置Subversion

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6