Postgresql 数据在一张表中存在，另一张表不存在-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4669183
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

一、性能测试

1. not in:
explain (analyze,verbose,costs,buffers) select ID from T1 where ID not in (select ID from T2 where finished=1);
Total runtime: 0.128 ms

2. not exists:
explain (analyze,verbose,costs,buffers) select ID from T1 where not exists (select 1 from T2 where T1.ID=T2.ID and T2.finished=1);
Total runtime: 0.105 ms

3. left join:
explain (analyze,verbose,costs,buffers) select T1.ID from T1 left join T2 on T1.ID=T2.ID and T2.finished=1 where T2.ID is null;
Total runtime: 0.096 ms

4. 网上还看到一种更快方法，但测试下来此方法不对，所以不讨论：
select ID from T2 where (select count(1) from T1 where T1.ID=T2.ID) = 0; 这条语句查询结果为空

因此在postgresql 9.3 上语句执行速度 left join > not exists > not in
当T1和T2表中ID出现null时，not in 语句会有不同的表现，所以推荐总是用not exists 代替 not in.

二、大数据量性能测试

在大量数据的时候，not in有严重性能下降的问题，下面是我在i5 2.4GHz MAC pro 13吋上的测试。
department(T1) 为59280条数据，数据长度29字符；dept(T2) 为23633条数据，数据长度29字符。

1. explain analyze select department.id from department where department.id not in (select id from dept where finished=true);
Total runtime: 447073.065 ms

2. explain analyze select department.id from department where not exists (select 1 from dept where department.id=dept.id and finished=true);
Total runtime: 325.732 ms

3. explain analyze select department.id from department left join dept on department.id=dept.id and dept.finished=true where dept.id is null;
Total runtime: 319.869 ms

三、总结：

在Postgresql 9.3上：
not in 不仅性能差，而且逻辑可能有问题。
not exists 性能不错，思考起来比较容易。
left join 性能最好，但总体跟not exists 比也快不了多少，思考稍微有点绕。

下面是一张网上的left join 的图，但找不到出处，有助于理解 left join 的过程：

阅读(3028) | 评论(0) | 转发(0) |

上一篇：A Python guide to handling HTTP request failures

下一篇：PostgreSQL学习手册(创建数据表)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6