由于teradata share nothing的架构,就导致在经常需要执行重分布操作。
teradata中group by的机制是先在每个amp执行group by操作,然后执行总的group by操作,这样重复的数据就会在重分布前被过滤掉,并且如果每个重复的数据量多,需要进行重分布的量就会大大减少,从而大幅度提高性能。 反之,由于group by需要两次排重操作,如果重复的数据很少,性能就会很差。
相反,由于distinct仅在重分布之后进行排重操作,因此适合于重复量很少的去重操作。
上述推断可以从expalin看出,如下:
SELECT DISTINCT Bil_User_Type_Cd FROM BSSDATA.NET_CDR_VS_O_B_TMP ;
Step Est. Time Actual Time Est. Rows Actual Rows Step Text
1 0:00.00 0:00.00 0 1 First, lock BSSDATA ."pseudo table" for read on a row hash.
2 0:00.00 0:00.00 0 142 Next, we lock BSSDATA .NET_CDR_VS_O_B_TMP for read.
3 0:35.17 79760690 We do an All-AMPs RETRIEVE step from BSSDATA .NET_CDR_VS_O_B_TMP by way of an all-rows scan into Spool 39689, which is redistributed by hash code to all AMPs.
4 0:00.00 0 We send out an END TRANSACTION step to all AMPs involved in processing the request.
----------------------------------------------------------
SEL Bil_User_Type_Cd FROM BSSDATA.NET_CDR_VS_O_B_TMP
GROUP BY 1
1 0:00.00 0:00.00 0 1 First, lock BSSDATA ."pseudo table" for read on a row hash.
2 0:00.00 0:00.01 0 142 Next, we lock BSSDATA .NET_CDR_VS_O_B_TMP for read.
3 0:05.93 8930 We do a SUM step to aggregate from BSSDATA .NET_CDR_VS_O_B_TMP by way of an all-rows scan. Aggregate Intermediate Results are computed globally, then placed in Spool 39695.
4 0:00.01 8930 We do an All-AMPs RETRIEVE step from Spool 39695 (Last Use) by way of an all-rows scan into Spool 39693, which is built locally on the AMPs.
5 0:00.00 0 We send out an END TRANSACTION step to all AMPs involved in processing the request.
阅读(1306) | 评论(0) | 转发(0) |