SQL优化（6）——nested loop驱动行源的选择-yuechaotian-ChinaUnix博客

CLEANERyuechaotian.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

yuechaotian

博客访问： 1713323
博文数量： 136
博客积分： 10021
博客等级：上将
技术积分： 3261
用户组：普通用户
注册时间： 2007-01-22 11:26

文章分类

全部博文（136）

SQL TUNING（4）
SQLSERVER（1）
IDEOLOGY（1）
HUMOURS（0）
LINUX（17）
LIFE（14）
ORACLE PROGRAMIN（72）
ORACLE MANAGING（26）
未分配的博文（1）

文章存档

2010年（1）

2009年（26）

2008年（109）

我的朋友

相关博文

SQL优化（6）——nested loop驱动行源的选择

分类： Oracle

2009-03-29 22:42:49

一般情况下，nested loop驱动行源应该选择小表，或者行数少的行源，但有时候也不一定。我们来看一个使用大表做驱动行源的例子。

1. 创建测试环境

大表test_big，小表test_small，二者通过test_big.object_id=test_small.object_id来关联。其中大表的object_id中会有重复记录，而小表中的object_id是唯一的，两个列上都有索引。

C:\Documents and Settings\yuechao.tianyc>sqlplus test/test

SQL*Plus: Release 10.2.0.1.0 - Production on 星期日 3月 29 22:37:13 2009

连接到:

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

SQL> create table test_big as select * from dba_objects;

表已创建。

SQL> insert into test_big select * from test_big;

已创建50000行。

SQL> /

已创建100000行。

...

SQL> /

已创建6400000行。

SQL> commit;

提交完成。

SQL> create table test_small as select * from dba_objects where rownum < 5001;

表已创建。

SQL> create index ind_test_big on test_big(object_id);

索引已创建。

SQL> create index ind_test_small on test_small(object_id);

索引已创建。

2. 获取执行计划

分别使用大表和小表作为驱动行源，先看一下它们的执行计划：

-- 1. 使用test_big作为驱动行源

-- 执行步骤：（1）全表扫描test_big，得到行源A；（2）将A作为驱动行源，嵌套循环连接索引ind_test_small，得到行源B；（3）将行源B通过索引ind_test_small与表test_small关联，得到结果集。

SQL> explain plan for

2 select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)

3 from test_big, test_small where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

Plan hash value: 3591390207

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 158 | 15M (1)| 53:19:22 |

| 1 | SORT AGGREGATE | | 1 | 158 | | |

| 2 | TABLE ACCESS BY INDEX ROWID| TEST_SMALL | 1 | 79 | 2 (0)| 00:00:01 |

| 3 | NESTED LOOPS | | 1247K| 187M| 15M (1)| 53:19:22 |

| 4 | TABLE ACCESS FULL | TEST_BIG | 14M| 1106M| 39134 (2)| 00:07:50 |

|* 5 | INDEX RANGE SCAN | IND_TEST_SMALL | 1 | | 1 (0)| 00:00:01 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

5 - access("test_big"."OBJECT_ID"="test_small"."OBJECT_ID")

Note

-----

- dynamic sampling used for this statement

已选择21行。

-- 2. 使用test_small作为驱动行源

-- 执行步骤：（1）全表扫描test_small，得到行源A；（2）将A作为驱动行源，嵌套循环连接索引ind_test_big，得到行源B；（3）将行源B通过索引ind_test_big与表test_big关联，得到结果集。

SQL> explain plan for

2 select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)

3 from test_small, test_big where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

Plan hash value: 1952886871

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 158 | 1098K (1)| 03:39:42 |

| 1 | SORT AGGREGATE | | 1 | 158 | | |

| 2 | TABLE ACCESS BY INDEX ROWID| TEST_BIG | 257 | 20303 | 226 (0)| 00:00:03 |

| 3 | NESTED LOOPS | | 1247K| 187M| 1098K (1)| 03:39:42 |

| 4 | TABLE ACCESS FULL | TEST_SMALL | 4854 | 374K| 17 (0)| 00:00:01 |

|* 5 | INDEX RANGE SCAN | IND_TEST_BIG | 257 | | 2 (0)| 00:00:01 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

5 - access("test_big"."OBJECT_ID"="test_small"."OBJECT_ID")

Note

-----

- dynamic sampling used for this statement

已选择21行。

3. 比较执行时间

比较上面两个不同的执行计划。看起来使用表test_small作为驱动行源更合理一些，它只有50001行，而表test_big有12800000行数据。而实际却不是这样，我们看一下他们各自的执行时间：

SQL> set timing on

-- 使用大表作为驱动行源

SQL> select/*+ordered use_nl(test_big,test_small)*/ count(test_big.object_name||test_small.object_name)

2 from test_big, test_small where test_big.object_id = test_small.object_id;

COUNT(TEST1.OBJECT_NAME||TEST2.OBJECT_NAME)

-------------------------------------------

12800000

已用时间: 00: 00: 36.70

-- 使用小表作为驱动行源

SQL> select/*+ordered use_nl(test1,test2)*/ count(test1.object_name||test2.object_name)

2 from test2, test1 where test1.object_id = test2.object_id;

COUNT(TEST1.OBJECT_NAME||TEST2.OBJECT_NAME)

-------------------------------------------

12800000

已用时间: 00: 02: 42.89

我们看到，使用大表作为驱动行源，耗时约36.7秒；而使用小表作为驱动行源，耗时达到2分42.89秒！

4. 原因

其实原因就在与大表的列object_id中有很多重复记录，当使用小表作为驱动行源时，全表扫描test_small的时间虽然很快，但对于test_small中的每一条记录，都对应test_big中的若干条记录，那么就需要根据查到的这些rowid，进行若干次的磁盘I/O来获得大表对应的数据。这里频繁的磁盘I/O就是问题的原因。因为test_big数据量很大，数据不可能存储在内存中。

而当使用大表作为驱动行源时，全表扫描test_big的时间虽然比较长，但每条记录都对应test_small中的一条记录，而且表test_small比较小，其数据及索引数据可以从内存中直接找到。这样耗费的时间就主要是全表扫描test_big的时间了，而这耗费不了太多的时间。在下面的测试中，只耗费了26.14秒：

SQL> select count(*) from test_big;

COUNT(*)

----------

12800000

已用时间: 00: 00: 26.14

SQL> set timing off

SQL> explain plan for

2 select count(*) from test_big;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

--------------------------------------------------------------------------

Plan hash value: 3224830981

-----------------------------------------------------------------------

-----------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 38982 (2)| 00:07:48 |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | TABLE ACCESS FULL| TEST_BIG | 14M| 38982 (2)| 00:07:48 |

-----------------------------------------------------------------------

Note

-----

- dynamic sampling used for this statement

已选择13行。

5. 备注

这里只是一个小测试，来说明nested loop中，驱动行源的选择要根据实际情况而定，不是一成不变的。其实这个SQL使用hash join是最快的：

SQL> explain plan for

2 select count(test_big.object_name||test_small.object_name)

3 from test_big, test_small where test_big.object_id = test_small.object_id;

已解释。

SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------

Plan hash value: 1810242240

----------------------------------------------------------------------------------

----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 158 | 39404 (3)| 00:07:53 |

| 1 | SORT AGGREGATE | | 1 | 158 | | |

|* 2 | HASH JOIN | | 1247K| 187M| 39404 (3)| 00:07:53 |

| 3 | TABLE ACCESS FULL| TEST_SMALL | 4854 | 374K| 17 (0)| 00:00:01 |

| 4 | TABLE ACCESS FULL| TEST_BIG | 14M| 1106M| 39134 (2)| 00:07:50 |

----------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("TEST_BIG"."OBJECT_ID"="TEST_SMALL"."OBJECT_ID")

Note

-----

- dynamic sampling used for this statement

已选择20行。

SQL> set timing on

SQL> select count(test_big.object_name||test_small.object_name)

2 from test_big, test_small where test_big.object_id = test_small.object_id;

COUNT(TEST_BIG.OBJECT_NAME||TEST_SMALL.OBJECT_NAME)

---------------------------------------------------

1280000

已用时间: 00: 00: 21.42

阅读(3209) | 评论(3) | 转发(0) |

上一篇：SQL优化（4）——表的三种连接方式

下一篇：记录两个小测试

给主人留下些什么吧！~~

chinaunix网友2009-11-10 16:29:50

OMG,好聪明

回复 | 举报

chinaunix网友2009-05-13 22:01:24

呵呵，最近搞促销，霸王83.00买了两套，相对于59.00一套显得贼便宜。

回复 | 举报

chinaunix网友2009-04-21 22:48:30

nested join一般适用于根据outertable循环，对于outertable中的每一条数据，能快速定位到innertable中的表的连接，具体到真是的应用环境中，理论就是理论了。一个itput版主说过，脱离了真实得数据环境是无法谈优化的。田哥，最近过的如何？

回复 | 举报

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6