2008年(31)
分类: Mysql/postgreSQL
2008-05-08 17:19:33
第三篇:成长
章节列表:
第10章:增加字段
第11章:按需载入
第12章:维表子集
第13章:维度角色扮演
第14章:快照
第15章:唯度层次
第16章:多路径和不齐整的维度层次
第17章:维度退化
第18章:垃圾维
第19章:多星型模式
本篇概述
如果的数据仓库有个成功的开始,你的客户一定想要更多。他们可能要求你载入更多的数据到数据仓库中。到时候,你将为数据仓库的成长做好准备。
第3篇将讨论如何通过以下的方式可以使你扩充你的数据仓库:
n 增加字段和表
n 分离维度
n 根据特殊的目的用不同的方式应用维度
n 增加一个衍生的星型或者一个完整的新的星型
第10章: 增加字段
本章教你一个刚开始的数据仓库最常遇到的扩充情况:向一个现有的维和事实表增加字段。本章将以讨论模式在你需要增加一个或者两个字段的时候会发生的情况为开始。进而向你示范如何能往客户维和销售订单事实表增加一个新的字段。并且向这个新的字段实施SCD2。
扩展模式
本节我将向你展示如何修正数据仓库的模式。图10.1显示扩展后的模式,在customer_dim表和sales_order_fact表中有新增的字段。在customer_dim表中新增的字段是shipping_address, shipping_zip_code, shipping_city,和shipping_state。sales_order_fact表中有一个新增的字段:order_quantity.
列表10-1中的shipping_address.sql在customer_dim和customer_stg表中增加新的字段。在你运行这个脚本前,将你的mysql时间设置为:
图10-1: customer_dim 和 sales_order_fact表中的新字段
列表10-1:向客户维度添加新字段
/*****************************************************************/
/* */
/* shipping_address.sql */
/* */
/*****************************************************************/
USE dw;
ALTER TABLE customer_dim
ADD shipping_address CHAR (50) AFTER customer_state
, ADD shipping_zip_code INT (5) AFTER shipping_address
, ADD shipping_city CHAR (30) AFTER shipping_zip_code
, ADD shipping_state CHAR (2) AFTER shipping_city
;
ALTER TABLE customer_stg
ADD shipping_address CHAR (50) AFTER customer_state
, ADD shipping_zip_code INT (5) AFTER shipping_address
, ADD shipping_city CHAR (30) AFTER shipping_zip_code
, ADD shipping_state CHAR (2) AFTER shipping_city
;
/* end of script */
你可以用如下命令运行上述脚本:
mysql> \. c:\mysql\scripts\shipping_address.sql
你的msql控制台上将打印出如下信息:
Database changed
Query OK, 9 rows affected (0.73 sec)
Records: 9 Duplicates: 0 Warnings: 0
Query OK, 8 rows affected (0.49 sec)
Records: 8 Duplicates: 0 Warnings: 0
为了确保新的字段已经加入,查询customer_dim表:
mysql> select * from customer_dim \G
下面是查询结果:
*************************** 1. row ***************************
customer_sk: 1
customer_number: 1
customer_name: Really Large Customers
customer_street_address:
customer_zip_code: 17050
customer_city: Mechanicsburg
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2005–03–01
expiry_date: 9999–12–31
*************************** 2. row ***************************
customer_sk: 2
customer_number: 2
customer_name: Small Stores
customer_street_address:
customer_zip_code: 17055
customer_city:
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2005–03–01
expiry_date: 9999–12–31
*************************** 3. row ***************************
customer_sk: 3
customer_number: 3
customer_name: Medium Retailers
customer_street_address:
customer_zip_code: 17055
customer_city:
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2005–03–01
expiry_date: 9999–12–31
*************************** 4. row ***************************
customer_sk: 4
customer_number: 4
customer_name: Good Companies
customer_street_address:
customer_zip_code: 17050
customer_city: Mechanicsburg
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective date: 2005–03–01
expiry_date: 9999–12–31
*************************** 5. row ***************************
customer_sk: 5
customer_number: 5
customer_name: Wonderful Shops
customer_street_address:
customer_zip_code: 17050
customer_city: Mechanicsburg
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2005–03–01
expiry_date: 9999–12–31
*************************** 6. row ***************************
customer_sk: 6
customer_number: 6
customer_name: Extremely Loyal Clients
customer_street_address:
customer_zip_code: 17055
customer_city:
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective date: 2005–03–01
expiry_date: 2007–02–28
*************************** 7. row ***************************
customer_sk: 7
customer_number: 7
customer_name: Distinguished Agencies
customer_street_address:
customer_zip_code: 17050
customer_city: Mechanicsburg
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective date: 2005–03–01
expiry_date: 9999–12–31
*************************** 8. row ***************************
customer_sk: 8
customer_number: 6
customer_name: Extremely Loyal Clients
customer_street_address:
customer_zip_code: 17055
customer_city:
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2007–03–01
expiry_date: 9999–12–31
*************************** 9. row ***************************
customer_sk: 9
customer_number: 8
customer_name: Subsidiaries
customer_street_address:
customer_zip_code: 17055
customer_city:
customer_state: PA
shipping_address: NULL
shipping_zip_code: NULL
shipping_city: NULL
shipping_state: NULL
effective_date: 2007–03–01
expiry_date: 9999–12–31
9 rows in set (0.00 sec)
注:新的字段还没有装载数据,所有它们的值都是空(NULL)
列表10-2中的order_quantity.sql脚本向sales_prder_fact表增加order_quantity字段。
列表10-2:添加order_quantity字段
/*****************************************************************/
/* */
/* order_quantity.sql */
/* */
/*****************************************************************/
USE dw;
ALTER TABLE sales_order_fact
ADD order_quantity INT AFTER order_amount
;
/* end of script */
你可以用如下的命令运行上述脚本:
mysql> \. c:\mysql\scripts\order_quantity.sql
回车后你在控制台上可以看到:
Database changed
Query OK, 35 rows affected (0.56 sec)
Records: 35 Duplicates: 0 Warnings: 0
为了确保新的字段被加入,查询sales_order_fact表
mysql> select order_sk osk, customer_sk csk, product_sk psk,
order_date_sk odsk,
-> order_amount amt, order_quantity qty
-> from sales_order_fact;
结果如下:
+-----+-----+-----+------+---------+------+
| osk | csk | psk | odsk | amt | qty |
+-----+-----+-----+------+---------+------+
| 1 | 3 | 3 | 1 | 4000.00 | NULL |
| 2 | 4 | 1 | 46 | 4000.00 | NULL |
| 3 | 5 | 2 | 81 | 6000.00 | NULL |
| 4 | 6 | 3 | 152 | 6000.00 | NULL |
| 5 | 7 | 1 | 185 | 8000.00 | NULL |
| 6 | 1 | 2 | 255 | 8000.00 | NULL |
| 7 | 2 | 3 | 311 | 1000.00 | NULL |
| 8 | 3 | 1 | 347 | 1000.00 | NULL |
| 9 | 4 | 2 | 380 | 2000.00 | NULL |
| 10 | 5 | 3 | 416 | 2500.00 | NULL |
| 11 | 6 | 1 | 456 | 3000.00 | NULL |
| 12 | 7 | 2 | 458 | 3500.00 | NULL |
| 13 | 1 | 3 | 502 | 4000.00 | NULL |
| 14 | 2 | 1 | 548 | 4500.00 | NULL |
| 15 | 3 | 2 | 554 | 1000.00 | NULL |
| 16 | 4 | 3 | 584 | 1000.00 | NULL |
| 17 | 5 | 1 | 681 | 4000.00 | NULL |
| 18 | 6 | 2 | 722 | 4000.00 | NULL |
| 19 | 7 | 3 | 730 | 4000.00 | NULL |
| 20 | 1 | 1 | 731 | 1000.00 | NULL |
| 21 | 2 | 2 | 731 | 2000.00 | NULL |
| 22 | 3 | 4 | 731 | 3000.00 | NULL |
| 23 | 4 | 5 | 731 | 4000.00 | NULL |
| 24 | 5 | 2 | 731 | 1000.00 | NULL |
| 25 | 8 | 2 | 731 | 3000.00 | NULL |
| 26 | 7 | 4 | 731 | 5000.00 | NULL |
| 27 | 9 | 5 | 731 | 7000.00 | NULL |
| 28 | 1 | 1 | 731 | 1000.00 | NULL |
| 29 | 2 | 2 | 731 | 2000.00 | NULL |
| 30 | 3 | 4 | 731 | 4000.00 | NULL |
| 31 | 4 | 5 | 731 | 6000.00 | NULL |
| 32 | 5 | 1 | 731 | 2500.00 | NULL |
| 33 | 8 | 2 | 731 | 5000.00 | NULL |
| 34 | 7 | 4 | 731 | 7500.00 | NULL |
| 35 | 9 | 5 | 731 | 1000.00 | NULL |
+-----+-----+-----+------+---------+------+
35 rows in set (0.00 sec)
修正定期装载脚本
当你修改了数据库模式后,你需要修正你一直用于定期装载的脚本。列表10-3表示修正后的定期装载脚本。
让我们假设源数据仅仅提供活跃客户的海运地址;客户表中非活跃客户的海运地址将是空的。因此你需要更新数据仓库中现有的用户信息,当他们的海运地址在源数据中有效的时候。如果用户还是希望你维护海运地址的历史信息,你将对海运地址字段应用SCD2。
我们还假设订单数量在销售订单源数据中是有效的,而且客户允许那些已经在数据仓库中的销售订单记录将不用更新。
列表10-3:修正dw的每日定期装载
/*****************************************************************/
/* */
/* dw_regular_10.sql */
/* */
/*****************************************************************/
USE dw;
/* CUSTOMER_DIM POPULATION */
TRUNCATE customer_stg;
LOAD DATA INFILE 'customer.csv'
INTO TABLE customer_stg
FIELDS TERMINATED BY ' , '
OPTIONALLY ENCLOSED BY ' " '
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
( customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state )
;
/* SCD 2 ON ADDRESSES */
UPDATE
customer_dim a
, customer_stg b
SET
a.expiry_date = SUBDATE (CURRENT_DATE, 1)
WHERE
a.customer_number = b.customer_number
AND ( a.customer_street_address <> b.customer_street_address
OR a.customer_city <> b.customer_city
OR a.customer_zip_code <> b.customer_zip_code
OR a.customer_state <> b.customer_state
OR a.shipping_address <> b.shipping_address
OR a.shipping_city <> b.shipping_city
OR a.shipping_zip_code <> b.shipping_zip_code
OR a.shipping_state <> b.shipping_state
OR a.shipping_address IS NULL
OR a.shipping_city IS NULL
OR a.shipping_zip_code IS NULL
OR a.shipping_state IS NULL)
AND expiry_date = '
;
INSERT INTO customer_dim
SELECT
NULL
, b.customer_number
, b.customer_name
, b.customer_street_address
, b.customer_zip_code
, b.customer_city
, b.customer_state
, b.shipping_address
, b.shipping_zip_code
, b.shipping_city
, b.shipping_state
, CURRENT_DATE
, '
FROM
customer_dim a
, customer_stg b
WHERE
a.customer_number = b.customer_number
AND ( a.customer_street_address <> b.customer_street_address
OR a.customer_city <> b.customer_city
OR a.customer_zip_code <> b.customer_zip_code
OR a.customer_state <> b.customer_state
OR a.shipping_address <> b.shipping_address
OR a.shipping_city <> b.shipping_city
OR a.shipping_zip_code <> b.shipping_zip_code
OR a.shipping_state <> b.shipping_state
OR a.shipping_address IS NULL
OR a.shipping_city IS NULL
OR a.shipping_zip_code IS NULL
OR a.shipping_state IS NULL)
AND EXISTS (
SELECT *
FROM customer_dim x
WHERE b.customer_number = x.customer_number
AND a.expiry_date = SUBDATE (CURRENT_DATE, 1))
AND NOT EXISTS (
SELECT *
FROM customer_dim y
WHERE b.customer_number = y.customer_number
AND y.expiry_date = '9999–12–31')
;
/* END OF SCD 2 */
/* SCD 1 ON NAME */
UPDATE customer_dim a, customer_stg b
SET a.customer_name = b.customer_name
WHERE a.customer_number = b.customer_number
AND a.expiry_date = '9999–12–31'
AND a.customer_name <> b.customer_name
;
/* ADD NEW CUSTOMER */
INSERT INTO customer dim
SELECT
NULL
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, CURRENT_DATE
, '9999–12–31'
FROM customer_stg
WHERE customer_number NOT IN(
SELECT a.customer_number
FROM
customer_dim a
, customer_stg b
WHERE b.customer_number = a.customer_number )
;
/* END OF CUSTOMER_DIM POPULATION */
/* product dimension loading */
TRUNCATE product_stg
;
LOAD DATA INFILE 'product.txt'
INTO TABLE product_stg
FIELDS TERMINATED BY ' '
OPTIONALLY ENCLOSED BY ' '
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
( product_code
, product_name
, product_category )
;
/* PRODUCT_DIM POPULATION */
/* SCD2 ON PRODUCT NAME AND GROUP */
UPDATE
product_dim a
, product_stg b
SET
expiry_date = SUBDATE (CURRENT_DATE, 1)
WHERE
a.product_code = b.product_code
AND ( a.product_name <> b.product_name
OR a.product_category <> b.product_category
AND expiry_date = '9999–12–31'
;
INSERT INTO product_dim
SELECT
NULL
, b.product_code
, b.product_name
, b.product_category
, CURRENT_DATE
, '9999–12–31'
FROM
product_dim a
, product_stg b
WHERE
a.product_code = b.product_code
AND ( a.product_name <> b.product_name
OR a.product_category <> b.product_category )
AND EXISTS (
SELECT *
FROM product_dim x
WHERE b.product_code = x.product_code
AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)
AND NOT EXISTS (
SELECT *
FROM product_dim y
WHERE b.product_code = y.product_code
AND y.expiry_date = '9999–12–31')
;
/* END OF SCD 2 */
/* ADD NEW PRODUCT */
INSERT INTO product_dim
SELECT
NULL
, product_code
, product_name
, product_category
, CURRENT_DATE
, '9999–12–31'
FROM product_stg
WHERE product_code NOT IN(
SELECT y.product_code
FROM product_dim x, product_stg y
WHERE x.product_code = y.product_code )
;
/* END OF PRODUCT_DIM POPULATION */
/* ORDER_DIM POPULATION */
INSERT INTO order_dim (
order_sk
, order_number
, effective_date
, expiry_date
)
SELECT
NULL
, order_number
, order_date
, '9999–12–31'
FROM source.sales_order
WHERE entry_date = CURRENT_DATE
;
/* END OF ORDER_DIM POPULATION */
/* SALES_ORDER_FACT POPULATION */
INSERT INTO sales_order_fact
SELECT
order_sk
, customer_sk
, product_sk
, date_sk
, order_amount
, order_quantity
FROM
source.sales order a
, order_dim b
, customer_dim c
, product_dim d
, date_dim e
WHERE
a.order_number = b.order_number
AND a.customer_number = c.customer_number
AND a.order_date >= c.effective_date
AND a.order_date <= c.expiry_date
AND a.product_code = d.product_code
AND a.order_date >= d.effective_date
AND a.order_date <= d.expiry_date
AND a.order_date = e.date
AND a.entry_date = CURRENT_DATE
;
/* end of script */