Postgresql使用left join 优化 not in-轨迹16-ChinaUnix博客

轨迹

首页　| 　博文目录　| 　关于我

轨迹16

博客访问： 1365393
博文数量： 244
博客积分： 1039
博客等级：少尉
技术积分： 1562
用户组：普通用户
注册时间： 2011-01-06 09:40

文章分类

全部博文（244）

PostgreSQL（6）
lua（0）
存储（5）
qt（9）
基础知识（0）
u-boot（7）
usb（6）
linux使用问题（27）
文件系统（1）
android应用（3）
软件设计（35）
软件大师（3）
eclipse（0）
C/C++（32）
Linux内核（43）
杂文（2）
tcp/ip网络编程（9）
linux/unix应用编（39）
SQL（3）
未分配的博文（14）

文章存档

2021年（2）

2019年（6）

2018年（2）

2014年（1）

2013年（187）

2012年（47）

我的朋友

Postgresql使用left join 优化 not in

场景：
查看在t1表中的数据，而这些数据不能出现在t2表。初学者容易想到 not in。这个问题可以使用left join 解决，如果两个表的结构一致，也可以采用差集来解决

使用例子

建立测试数据
新建两个表t1, t2, 每个表有100万的数据。有999990是相同的。现在要找到在t1的数据，而不再t2的数据。

create table t1(id int, info text);
create table t2(id int, info text);

insert into t1(id, info) select i, md5(i::text) from generate_series(1, 1000000) t(i);
insert into t2(id, info) select i, md5(i::text) from generate_series(11, 1000000 + 10) t(i);

create index on t1(id);
create index on t2(id);

postgres=# select count(*) from t1;
  count  
---------
 1000000
(1 row)

postgres=# select count(*) from t2;
  count  
---------
 1000000
(1 row)

使用not in 效果

postgres=# explain  select * from t1 where id not in (select id from t2);
                               QUERY PLAN                                   
--------------------------------------------------------------------------------
 Gather  (cost=1000.00..6196106209.00 rows=500000 width=37)
   Workers Planned: 2
   ->  Parallel Seq Scan on t1  (cost=0.00..6196055209.00 rows=208333 width=37)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..27241.00 rows=1000000 width=4)
                 ->  Seq Scan on t2  (cost=0.00..18334.00 rows=1000000 width=4)
(7 rows)

使用left join 的效果

postgres=# explain  select t1.* from t1 left join t2 on t1.id = t2.id where t2.* is  null;
                                  QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Gather  (cost=23592.00..51286.28 rows=5000 width=37)
   Workers Planned: 2
   ->  Parallel Hash Left Join  (cost=22592.00..49786.28 rows=2083 width=37)
         Hash Cond: (t1.id = t2.id)
         Filter: (t2.* IS NULL)
         ->  Parallel Seq Scan on t1  (cost=0.00..12500.67 rows=416667 width=37)
         ->  Parallel Hash  (cost=12500.67..12500.67 rows=416667 width=65)
               ->  Parallel Seq Scan on t2  (cost=0.00..12500.67 rows=416667 width=65)
(8 rows)

对比结果
6196106209 / 51286 = 120814
基本上相差12万倍

阅读(10469) | 评论(0) | 转发(0) |

上一篇：PostgreSQL 在没有备份的情况下_如何恢复误删的数据

下一篇：Postgresql查询成本计算初探

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6