Chinaunix首页 | 论坛 | 博客
  • 博客访问: 130276
  • 博文数量: 69
  • 博客积分: 595
  • 博客等级: 中士
  • 技术积分: 670
  • 用 户 组: 普通用户
  • 注册时间: 2008-04-16 17:37
文章分类

全部博文(69)

文章存档

2017年(2)

2016年(9)

2015年(13)

2014年(30)

2012年(4)

2011年(2)

2010年(2)

2009年(5)

2008年(2)

我的朋友

分类: LINUX

2016-09-16 10:53:02

Join versus lookup
  Version 8.7.0  Version 11.5.0  Version 11.3.0  Version 9.1.0  Version 8.5.0  Version 8.1.0 

InfoSphere? DataStage? does not know how large your data is, so cannot make an informed choice whether to combine data using a join stage or a lookup stage. Here's how to decide which to use:

There are two data sets being combined. One is the primary or driving dataset, sometimes called the left of the join. The other data set(s) are the reference datasets, or the right of the join.

In all cases you are concerned with the size of the reference datasets. If these take up a large amount of memory relative to the physical RAM memory size of the computer you are running on, then a lookup stage might thrash because the reference datasets might not fit in RAM along with everything else that has to be in RAM. This results in very slow performance since each lookup operation can, and typically does, cause a page fault and an I/O operation.

So, if the reference datasets are big enough to cause trouble, use a join. A join does a high-speed sort on the driving and reference datasets. This can involve I/O if the data is big enough, but the I/O is all highly optimized and sequential. Once the sort is over the join processing is very fast and never involves paging or other I/O.

阅读(305) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~