2008年(31)
分类: Mysql/postgreSQL
2008-10-23 15:25:02
宾夕法尼亚州地区客户维
在本节我将用宾夕法尼亚州地区客户的子集维度来解释第二种维度子集的类型。我也将向你说明如何测试该子集维度。
相对的,一个向上钻取的维包含了它基础维的所有更高级别的数据。而一个特定子集维度则选择了它基础维的某个特定的数据集合。列表12-3所示的脚本产生并加载了宾夕法尼亚州(PA)地区客户子集维。
注意到,有两个事情是宾夕法尼亚州地区客户子集维区别于月份子集维的地方。
n pa_customer_dim表和customer_dim表的字段结构一样,而month_dim表没有 date_dim表中的日期字段。
n pa_customer_dim表的代理键是客户表的代理键。Month_dim表的代理键只属于month_dim表,并不是来自日期维表。
列表12-3 PA的客户:
/**********************************************************************/
/* */
/* pa_customer.sql */
/* */
/**********************************************************************/
USE dw;
CREATE TABLE pa_customer_dim
( customer_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, customer_number INT
, customer_name CHAR (50)
, customer_street_address CHAR (50)
, customer_zip_code INT (5)
, customer_city CHAR (30)
, customer_state CHAR (2)
, shipping_address CHAR (50)
, shipping_zip_code INT (5)
, shipping_city CHAR (30)
, shipping_state CHAR (2)
, effective_date DATE
, expiry_date DATE )
;
INSERT INTO pa_customer_dim
SELECT
customer_sk
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date
FROM customer_dim
WHERE customer_state = 'PA'
;
/* end of script */
为了测试PA子维脚本,你需要先用列表12-4的脚本来增加三个居住于俄亥俄州的客户。
列表12-4:非PA 客户:
/**********************************************************************/
/* */
/* non_pa_customer.sql */
/* */
/**********************************************************************/
/* default to dw */
USE dw;
INSERT INTO customer_dim
( customer_sk
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date )
VALUES
(NULL, 10, 'Bigger Customers',
'
'OH', CURRENT_DATE, '
, (NULL, 11, 'Smaller Stores', '8888
'Cleveland', 'OH', '8888
'Cleveland', 'OH', CURRENT_DATE, '
, (NULL, 12, 'Small-Medium Retailers',
'
'OH', CURRENT_DATE, '
;
/* end of script */
用以下命令运行列表12-4中的脚本。
mysql> \. c:\mysql\scripts\non_pa_customer.sql
在你的控制台上将得到下面的响应信息:
Database changed
Query OK, 3 rows affected (0.86 sec)
Records: 3 Duplicates: 0 Warnings: 0
现在你已经准备好运行列表12-3中的pa_customer.sql脚本,在你做这些之前,确定你的Msql数据库的日期仍然是
你可以用如下命令方式运行pa_customer.sql脚本。
mysql> \. c:\mysql\scripts\pa_customer.sql
你将在控制台上看到:
Database changed
Query OK, 0 rows affected (0.20 sec)
Query OK, 18 rows affected (0.08 sec)
Records: 18 Duplicates: 0 Warnings: 0
为了确保三个OH客户已经成功的载入,查询customer_dim 表:
mysql> select customer_name, customer_state, effective_date from customer_dim;
控制台上将看到:
+---------------------------------+----------------+---------------+
| customer_name | customer_state | effective_date|
+---------------------------------+----------------+---------------+
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Extremely Loyal Clients | PA |
| Subsidiaries | PA |
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Subsidiaries | PA |
| Online Distributors | PA |
| Bigger Customers | OH |
| Smaller Stores | OH |
| Small-Medium Retailers | OH |
+---------------------------------+----------------+---------------+
21 rows in set (0.00 sec)
现在查询pa_customer_dim表来确定只有PA的客户在PA客户维表中。
mysql> select customer_name, customer_state, effective_date from pa_customer_dim;
结果如下:
+---------------------------------+----------------+---------------+
| customer_name | customer_state | effective_date|
+---------------------------------+----------------+---------------+
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Extremely Loyal Clients | PA |
| Subsidiaries | PA |
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Subsidiaries | PA |
| Online Distributors | PA |
+---------------------------------+----------------+---------------+
18 rows in set (0.00 sec)
正如你所看到的,只有PA客户进入到该表中,之前加入的OH客户并没有在其中。
修改定期装载
当一个新的PA客户资料加入到客户维中时,为了能够同时加入PA客户维表,你需要把PA客户子集维的装载合并到数据仓库的定期装载过程中。 修改后的定期装载脚本将在列表12-5中列出。这个改变(新增)用粗体显示。注意到,当每次你运行每日的定期装载脚本时,该脚本重建(截断,然后加入所有的PA客户)了PA客户子集维。
列表12-5:修正后的每日DW定期装载
/******************************************************************/
/* */
/* dw_regular_12.sql */
/* */
/******************************************************************/
USE dw;
/* CUSTOMER_DIM POPULATION */
TRUNCATE customer_stg;
LOAD DATA INFILE 'customer.csv'
INTO TABLE customer_stg
FIELDS TERMINATED BY ' , '
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
( customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state )
;
/* SCD 2 ON ADDRESSES */
UPDATE
customer_dim a
, customer_stg b
SET
a.expiry_date = SUBDATE (CURRENT_DATE, 1)
WHERE
a.customer_number = b.customer_number
AND ( a.customer_street_address <> b.customer_street_address
OR a.customer_city <> b.customer_city
OR a.customer_zip_code <> b.customer_zip_code
OR a.customer_state <> b.customer_state
OR a.shipping_address <> b.shipping_address
OR a.shipping_city <> b.shipping_city
OR a.shipping_zip_code <> b.shipping_zip_code
OR a.shipping_state <> b.shipping_state
OR a.shipping_address IS NULL
OR a.shipping_city IS NULL
OR a.shipping_zip_code IS NULL
OR a.shipping_state IS NULL)
AND expiry_date = '
;
INSERT INTO customer_dim
SELECT
NULL
, b.customer_number
, b.customer_name
, b.customer_street_address
, b.customer_zip_code
, b.customer_city
, b.customer_state
, b.shipping_address
, b.shipping_zip_code
, b.shipping_city
, b.shipping_state
, CURRENT_DATE
, '
FROM
customer_dim a
, customer_stg b
WHERE
a.customer_number = b.customer_number
AND ( a.customer_street_address <> b.customer_street_address
OR a.customer_city <> b.customer_city
OR a.customer_zip_code <> b.customer_zip_code
OR a.customer_state <> b.customer_state
OR a.shipping_address <> b.shipping_address
OR a.shipping_city <> b.shipping_city
OR a.shipping_zip_code <> b.shipping_zip_code
OR a.shipping_state <> b.shipping_state
OR a.shipping_address IS NULL
OR a.shipping_city IS NULL
OR a.shipping_zip_code IS NULL
OR a.shipping_state IS NULL)
AND EXISTS (
SELECT *
FROM customer_dim x
WHERE b.customer_number = x.customer_number
AND a.expiry_date - SUBDATE (CURRENT_DATE, 1))
AND NOT EXISTS (
SELECT *
FROM customer_dim y
WHERE b.customer_number = y.customer_number
AND y.expiry_date - '
;
/* END OF SCD 2 */
/* SCD 1 ON NAME */
UPDATE customer_dim a, customer_stg b
SET a.customer_name = b.customer_name
WHERE a.customer_number = b.customer_number
AND a.expiry_date - '
AND a.customer_name <> b.customer_name
;
/* ADD NEW CUSTOMER */
INSERT INTO customer_dim
SELECT
NULL
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, CURRENT_DATE
, '
FROM customer_stg
WHERE customer_number NOT IN(
SELECT a.customer_number
FROM
customer_dim a
, customer_stg b
WHERE b.customer_number = a.customer_number )
/* RE-BUILD PA CUSTOMER DIMENSION */
TRUNCATE pa_customer_dim;
INSERT INTO pa_customer_dim
SELECT
customer_sk
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date
FROM customer_dim
WHERE customer_state = 'PA'
;
/* END OF CUSTOMER_DIM POPULATION */
/* product dimension loading */
TRUNCATE product_stg
;
LOAD DATA INFILE 'product.txt'
INTO TABLE product_stg
FIELDS TERMINATED BY ''
OPTIONALLY ENCLOSED BY ''
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
( product_code
, product_name
, product_category )
;
/* PRODUCT_DIM POPULATION */
/* SCD2 ON PRODUCT NAME AND GROUP */
UPDATE
product_dim a
, product_stg b
SET
expiry_date = SUBDATE (CURRENT_DATE, 1)
WHERE
a.product_code = b.product_code
AND ( a.product_name <> b.product_name
OR a.product_category <> b.product_category )
AND expiry_date = '
;
INSERT INTO product_dim
SELECT
NULL
, b.product_code
, b.product_name
, b.product_category
, CURRENT_DATE
, '
FROM
product_dim a
, product_stg b
WHERE
a.product_code = b.product_code
AND ( a.product_name <> b.product_name
OR a.product_category <> b.product_category )
AND EXISTS (
SELECT *
FROM product_dim x
WHERE b.product_code = x.product_code
AND a.expiry_date = SUBDATE (CURRENT_DATE, 1) )
AND NOT EXISTS (
SELECT *
FROM product_dim y
WHERE b.product_code = y.product_code
AND y.expiry_date = '
;
/* END OF SCD 2 */
/* ADD NEW PRODUCT */
INSERT INTO product_dim
SELECT
NULL
, product_code
, product_name
, product_category
, CURRENT_DATE
, '
FROM product_stg
WHERE product_code NOT IN(
SELECT y.product_code
FROM product_dim x, product_stg y
WHERE x.product_code = y.product_code
;
/* END OF PRODUCT_DIM POPULATION */
/* ORDER_DIM POPULATION */
INSERT INTO order_dim (
order_sk
, order_number
, effective_date
, expiry_date
)
SELECT
NULL
, order_number
, order_date
, '
FROM source.sales_order
WHERE entry_date = CURRENT_DATE
;
/* END OF ORDER_DIM POPULATION */
/* SALES_ORDER_FACT POPULATION */
INSERT INTO sales_order_fact
SELECT
order_sk
, customer_sk
, product_sk
, date_sk
, order_amount
, order_quantity
FROM
source.sales_order a
, order_dim b
, customer_dim c
, product_dim d
, date_dim e
WHERE
a.order_number = b.order_number
AND a.customer_number = c.customer_number
AND a.order_date >= c.effective_date
AND a.order_date <= c.expiry_date
AND a.product_code = d.product_code
AND a.order_date >= d.effective_date
AND a.order_date <= d.expiry_date
AND a.order_date = e.date
AND a.entry_date = CURRENT_DATE
;
/* end of script */
测试修正后的定期装载
现在你可以测试列表12-5的脚本。在你实施之前,先增加一些客户数据,通过运行列表12-6的脚本增加一个PA客户和一个OH客户到客户维中。
列表12-6:增加两个客户
/******************************************************************/
/* */
/* two_more_customers.sql */
/* */
/******************************************************************/
/* default to dw */
USE dw;
INSERT INTO customer_dim
( customer_sk
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, shipping_address
, shipping_zip_code
, shipping_city
, shipping_state
, effective_date
, expiry_date )
VALUES
(NULL, 13, 'PA Customer', '1111 Louise Dr.', '17050',
'Mechanicsburg', 'PA', '1111 Louise Dr.', '17050',
'Mechanicsburg', 'PA', CURRENT_DATE, '
, (NULL, 14, 'OH Customer', '6666 Ridge Rd.', '44102',
'
'Cleveland', 'OH', CURRENT_DATE, '
;
/* end of script */
现在运行列表12-6中的脚本:
mysql> \. c:\mysql\scripts\two_more_customers.sql
Mysql将显示有两个记录生效。
Database changed
Query OK, 2 rows affected (0.06 sec)
Records: 2 Duplicates: 0 Warnings: 0
现在更改你的Msql数据库的日期为
mysql> \. c:\mysql\scripts\dw_regular_12.sql
你将看到你的控制台有如下显示:
Database changed
Query OK, 9 rows affected (0.15 sec)
Query OK, 9 rows affected (0.14 sec)
Records: 9 Deleted: 0 Skipped: 0 Warnings: 0
Query OK, 0 rows affected (0.05 sec)
Rows matched: 0 Changed: 0 Warnings: 0
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0 Changed: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 18 rows affected (0.04 sec)
Query OK, 19 rows affected (0.06 sec)
Records: 19 Duplicates: 0 Warnings: 0
Query OK, 4 rows affected (0.09 sec)
Query OK, 4 rows affected (0.07 sec)
Records: 4 Deleted: 0 Skipped: 0 Warnings: 0
Query OK, 0 rows affected (0.06 sec)
Rows matched: 0 Changed: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.00 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
Query OK, 0 rows affected (0.17 sec)
Records: 0 Duplicates: 0 Warnings: 0
现在用该语句查询pa_customer_dim表。你将看到新的PA客户被插入到表中。
mysql> select customer_name, customer_state, effective_date from pa_customer_dim;
这是结果:
+-------------------------+----------------+---------------+
| customer_name | customer_state | effective_date|
+-------------------------+----------------+---------------+
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Extremely Loyal Clients | PA |
| Subsidiaries | PA |
| Really Large Customers | PA |
| Small Stores | PA |
| Medium Retailers | PA |
| Good Companies | PA |
| Wonderful Shops | PA |
| Extremely Loyal Clients | PA |
| Distinguished Agencies | PA |
| Subsidiaries | PA |
| Online Distributors | PA |
| PA Customer | PA |
+-------------------------+----------------+---------------+
19 rows in set (0.00 sec)
小结
在这章,你学习了两种类型的子集维。月份子集维是向上钻取维的一个例子,一个装载自比它更为详细的基础维的具有更高级的维。PA客户维是一个特定的子集维;只从它的基础维选择PA客户数据进行装载。
下一章,你将学习另外一个复用现有维的技术,叫角色扮演维度。