固定SQL语句的执行计划,而不是完全依赖于优化器的智能对于每个DBA来说都是非常重要的。
据文档介绍,Teradata理论上最多可以有64个表进行连接,而该功能的前提是统计的精确性,而作为一个大型系统来说,dba是无法保证每个表的统计都是最新的。 正是因为这一点,DBA才得以存在。
事实上,由于优化器仅靠收集的统计对连接顺序,连接方法做出选择,所以每一次统计的不够精确都会导致之后执行计划选择错误的概率不断增加,以至于最后选择了完全不可接受的执行计划。 例如,下面语句最后就导致了大型表进行product join,完全执行不下去:
insert into app_a.zwl_wxeksl_orders
SELECT t1.Order_Id
,t1.Create_Dt
,t4.CEmployee_Id
,t11.Dept_Name
,t0.Sales_Emp_Id
,t0.Action_Name
,t5.Accs_Nbr
,t6.CDsc_Id
,t6.CDsc_Name
FROM bssview.EVT_ORDER_HIST_A t1
INNER JOIN bssview.EVT_ORDI_HIST_A t2
ON t1.Order_Row_Id = t2.Order_Row_Id
AND t1.Create_Dt >= date'2008-12-18'
AND t1.Create_Dt < date'2008-12-19'
AND t1.Stat_Name <> '取消'
AND t1.End_Dt = date'3000-12-31'
AND t2.Root_Order_Item_Id = t2.Order_Item_Row_Id
AND t2.CRM_Sys_Makeup_Id in ('1-85SW', '1-85SY')
AND t2.Stat_Name <> '取消'
AND t2.End_Dt = date'3000-12-31'
INNER JOIN bssview.EVT_ORDI_HIST_A t0
ON t2.Order_Item_Row_Id = t0.Parent_Order_Item_Id
AND t0.CRM_Sys_Makeup_Id = '1-FV6G83'
AND t0.Stat_Name <> '取消'
AND t0.Action_Name in ('添加','删除')
AND t0.End_Dt = date'3000-12-31'
LEFT JOIN bssview.EVT_ORDI_CDSC_HIST_A t3
ON t3.Order_Item_Row_Id = t2.Order_Item_Row_Id
AND t3.CDsc_Row_Id in('1-JE1K-4124','1-JE1K-4126','1-LYHZ-1','1-LYHZ-2','1-R535-4','1-3M8FIKF','1-16J27-1','1-16I7U-1','1-16P03-17',
'1-16P03-20','1-16P03-21','1-16P03-22','1-16P03-23','1-16P03-24','1-16P03-18','1-16P03-19','1-16P03-25',
'1-16P03-14','1-16P03-15','1-16P03-16','1-16P03-9','1-16P03-10','1-16P03-11','1-16P03-12','1-17T5X-1')
AND t3.Action_Name in ('添加','删除')
AND t3.Stat_Name <> '-'
AND t3.End_Dt = date'3000-12-31'
INNER JOIN bssview.PAR_CEMPLOYEE_HIST t4
ON t4.CEmployee_Row_Id = t1.CEmployee_Row_Id
AND t4.End_Dt = date'3000-12-31'
INNER JOIN bssview.PAR_POS_A t00
ON t00.Pos_Row_Id = t4.Pos_Row_Id
INNER JOIN bssview.PAR_DEPT t11
ON t11.Dept_Row_Id = t00.Dept_Row_Id
INNER JOIN bssview.EVT_ORDI_EQUIP_HIST_A t5
ON t5.Order_Item_Row_Id = t2.Order_Item_Row_Id
AND t5.End_Dt = date'3000-12-31'
LEFT JOIN bssview.OFR_CDSC_A t6
ON t6.CDsc_Row_Id = t3.CDsc_Row_Id;
执行计划:
Step Est. Time Actual Time Est. Rows Actual Rows Step Text
1 0:00.00 0:00.01 0 1 First, lock APP_A ."pseudo table" for write on a row hash.
2 0:00.00 0:00.01 0 142 Next, we lock BSSDATA .OFR_CDSC_A for access, we lock BSSDATA .EVT_ORDI_HIST_A for access, we lock BSSDATA .EVT_ORDI_CDSC_HIST_A for access, we lock BSSDATA .PAR_DEPT for access, we lock BSSDATA .EVT_ORDER_HIST_A for access, we lock BSSDATA .OFR_MAP_STD_CDSC_A for access, we lock BSSDATA .PAR_CEMPLOYEE_HIST for access, we lock BSSDATA .PAR_POS_A for access, we lock BSSDATA .EVT_ORDI_EQUIP_HIST_A for access and we lock APP_A .ZWL_WXEKSL_ORDERS for write.
3 0:00.02 0:00.87 1609 2233 We do an All-AMPs RETRIEVE step from BSSDATA .PAR_POS_A by way of an all-rows scan into Spool 11686, which is redistributed by hash code to all AMPs. This step begins a parallel block of steps.
3 0:02.57 0:11.11 1 27062 We do an All-AMPs RETRIEVE step from BSSDATA .EVT_ORDER_HIST_A by way of an all-rows scan into Spool 11687, which is redistributed by hash code to all AMPs. This step ends a parallel block of steps.
4 0:00.00 0:00.65 1609 2233 We do an All-AMPs JOIN step from BSSDATA .PAR_DEPT by way of an all-rows scan, which is joined to Spool 11686. table PAR_DEPT and Spool 11686 are joined using a merge join . The result goes into Spool 11688, which is built locally on the AMPs. This step begins a parallel block of steps.
4 0:00.01 0:04.24 142 3842804 We do an All-AMPs JOIN step from Spool 11687 (Last Use) by way of an all-rows scan, which is joined to table PAR_CEMPLOYEE_HIST . Spool 11687 and table PAR_CEMPLOYEE_HIST are joined using a merge join . The result goes into Spool 11689, which is duplicated on all AMPs. This step is performed in parallel.
4 0:10.04 0:47.05 2828 26870 We do an All-AMPs RETRIEVE step from BSSDATA .EVT_ORDI_CDSC_HIST_A by way of an all-rows scan into Spool 11690, which is redistributed by hash code to all AMPs. This step ends a parallel block of steps.
5 0:00.01 0:00.69 2828 26870 We do an All-AMPs JOIN step from BSSDATA .OFR_CDSC_A by way of an all-rows scan, which is joined to Spool 11690. table OFR_CDSC_A and Spool 11690 are right outer joined using a merge join . The result goes into Spool 11691, which is redistributed by hash code to all AMPs. This step begins a parallel block of steps.
5 0:00.01 0:01.27 142 3826758 We do an All-AMPs JOIN step from Spool 11688 (Last Use) by way of an all-rows scan, which is joined to Spool 11689. Spool 11688 and Spool 11689 are joined using a merge join . The result goes into Spool 11694, which is duplicated on all AMPs. This step ends a parallel block of steps.
6 0:00.88 4:45.64 4992887 6122056 We do an All-AMPs JOIN step from Spool 11691 (Last Use) by way of an all-rows scan, which is joined to table EVT_ORDI_HIST_A . Spool 11691 and table EVT_ORDI_HIST_A are right outer joined using a merge join . The result goes into Spool 11695, which is built locally on the AMPs.
7 0:00.30 1 We do an All-AMPs JOIN step from Spool 11694 (Last Use) by way of an all-rows scan, which is joined to Spool 11695. Spool 11694 and Spool 11695 are joined using a product join . The result goes into Spool 11698, which is built locally on the AMPs. This step begins a parallel block of steps.
7 0:28.63 1:39.62 16263 18107 We do an All-AMPs RETRIEVE step from BSSDATA .EVT_ORDI_HIST_A by way of an all-rows scan into Spool 11699, which is built locally on the AMPs. This step ends a parallel block of steps.
8 0:00.00 142 We do an All-AMPs JOIN step from Spool 11698 (Last Use) by way of an all-rows scan, which is joined to table EVT_ORDI_EQUIP_HIST_A . Spool 11698 and table EVT_ORDI_EQUIP_HIST_A are joined using a merge join . The result goes into Spool 11700, which is duplicated on all AMPs.
9 0:00.00 1 We do an All-AMPs JOIN step from Spool 11699 (Last Use) by way of an all-rows scan, which is joined to Spool 11700. Spool 11699 and Spool 11700 are joined using a merge join . The result goes into Spool 11685, which is redistributed by hash code to all AMPs.
10 0:00.00 0 We do a MERGE into table ZWL_WXEKSL_ORDERS from Spool 11685.
11 0:00.00 0 We send out an END TRANSACTION step to all AMPs involved in processing the request.
由于在teradata中不存在类似oracle中hint的提示,其原理和oracle的hint是一样的,只不过有些说法不同而已。所以正确拆分脚本并选择恰当的pi可以极大地减少IO和cpu的消耗,下面是拆分后的脚本:
create volatile multiset table zwl_wxeksl_orders_tmp as
(SELECT
t2.Order_Item_Row_Id
,t1.Order_Id
,t1.Create_Dt
,t4.CEmployee_Id
,t11.Dept_Name
,t0.Sales_Emp_Id
,t0.Action_Name
,t5.Accs_Nbr
--,t6.CDsc_Id
--,t6.CDsc_Name
FROM bssview.EVT_ORDER_HIST_A t1
INNER JOIN bssview.EVT_ORDI_HIST_A t2
ON t1.Order_Row_Id = t2.Order_Row_Id
AND t1.Create_Dt >= date'2008-12-18'
AND t1.Create_Dt < date'2008-12-19'
AND t1.Stat_Name <> '取消'
AND t1.End_Dt = date'3000-12-31'
AND t2.Root_Order_Item_Id = t2.Order_Item_Row_Id
AND t2.CRM_Sys_Makeup_Id in ('1-85SW', '1-85SY')
AND t2.Stat_Name <> '取消'
AND t2.End_Dt = date'3000-12-31'
INNER JOIN bssview.EVT_ORDI_HIST_A t0
ON t2.Order_Item_Row_Id = t0.Parent_Order_Item_Id
AND t0.CRM_Sys_Makeup_Id = '1-FV6G83'
AND t0.Stat_Name <> '取消'
AND t0.Action_Name in ('添加','删除')
AND t0.End_Dt = date'3000-12-31'
INNER JOIN bssview.PAR_CEMPLOYEE_HIST t4
ON t4.CEmployee_Row_Id = t1.CEmployee_Row_Id
AND t4.End_Dt = date'3000-12-31'
INNER JOIN bssview.PAR_POS_A t00
ON t00.Pos_Row_Id = t4.Pos_Row_Id
INNER JOIN bssview.PAR_DEPT t11
ON t11.Dept_Row_Id = t00.Dept_Row_Id
INNER JOIN bssview.EVT_ORDI_EQUIP_HIST_A t5
ON t5.Order_Item_Row_Id = t2.Order_Item_Row_Id
AND t5.End_Dt = date'3000-12-31')
with data
primary index(Order_Item_Row_Id)
on commit preserve rows;
insert into app_a.zwl_wxeksl_orders
select
t2.Order_Id
,t2.Create_Dt
,t2.CEmployee_Id
,t2.Dept_Name
,t2.Sales_Emp_Id
,t2.Action_Name
,t2.Accs_Nbr
,t6.CDsc_Id
,t6.CDsc_Name
from zwl_wxeksl_orders_tmp t2
LEFT JOIN bssview.EVT_ORDI_CDSC_HIST_A t3
ON t3.Order_Item_Row_Id = t2.Order_Item_Row_Id
AND t3.CDsc_Row_Id in('1-JE1K-4124','1-JE1K-4126','1-LYHZ-1','1-LYHZ-2','1-R535-4','1-3M8FIKF','1-16J27-1','1-16I7U-1','1-16P03-17',
'1-16P03-20','1-16P03-21','1-16P03-22','1-16P03-23','1-16P03-24','1-16P03-18','1-16P03-19','1-16P03-25',
'1-16P03-14','1-16P03-15','1-16P03-16','1-16P03-9','1-16P03-10','1-16P03-11','1-16P03-12','1-17T5X-1')
AND t3.Action_Name in ('添加','删除')
AND t3.Stat_Name <> '-'
AND t3.End_Dt = date'3000-12-31'
LEFT JOIN bssview.OFR_CDSC_A t6
ON t6.CDsc_Row_Id = t3.CDsc_Row_Id;
通过对很多类似脚本的测试发现,这种方法基本上都能够防止优化器做出太差的选择。