Chinaunix首页 | 论坛 | 博客
  • 博客访问: 226159
  • 博文数量: 31
  • 博客积分: 1427
  • 博客等级: 上尉
  • 技术积分: 409
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-04 15:21
文章分类

全部博文(31)

文章存档

2008年(31)

我的朋友

分类: Mysql/postgreSQL

2008-05-08 17:19:33

第三篇:成长

章节列表:

10章:增加字段

11章:按需载入

12章:维表子集

13章:维度角色扮演

14章:快照

15章:唯度层次

16章:多路径和不齐整的维度层次

17章:维度退化

18章:垃圾维

19章:多星型模式

 

本篇概述

       如果的数据仓库有个成功的开始,你的客户一定想要更多。他们可能要求你载入更多的数据到数据仓库中。到时候,你将为数据仓库的成长做好准备。

       3篇将讨论如何通过以下的方式可以使你扩充你的数据仓库:

n        增加字段和表

n        分离维度

n        根据特殊的目的用不同的方式应用维度

n        增加一个衍生的星型或者一个完整的新的星型

 

10: 增加字段

       本章教你一个刚开始的数据仓库最常遇到的扩充情况:向一个现有的维和事实表增加字段。本章将以讨论模式在你需要增加一个或者两个字段的时候会发生的情况为开始。进而向你示范如何能往客户维和销售订单事实表增加一个新的字段。并且向这个新的字段实施SCD2

扩展模式

       本节我将向你展示如何修正数据仓库的模式。图10.1显示扩展后的模式,在customer_dim表和sales_order_fact表中有新增的字段。在customer_dim表中新增的字段是shipping_address, shipping_zip_code, shipping_city,shipping_statesales_order_fact表中有一个新增的字段:order_quantity.

       列表10-1中的shipping_address.sqlcustomer_dimcustomer_stg表中增加新的字段。在你运行这个脚本前,将你的mysql时间设置为:2007-03-01

10-1: customer_dim sales_order_fact表中的新字段

列表10-1:向客户维度添加新字段

/*****************************************************************/

/*                                                               */

/* shipping_address.sql                                          */

/*                                                               */

/*****************************************************************/

 

USE dw;

 

ALTER TABLE customer_dim

  ADD shipping_address CHAR (50) AFTER customer_state

, ADD shipping_zip_code INT (5) AFTER shipping_address

, ADD shipping_city CHAR (30) AFTER shipping_zip_code

, ADD shipping_state CHAR (2) AFTER shipping_city

;

 

ALTER TABLE customer_stg

  ADD shipping_address CHAR (50) AFTER customer_state

, ADD shipping_zip_code INT (5) AFTER shipping_address

, ADD shipping_city CHAR (30) AFTER shipping_zip_code

, ADD shipping_state CHAR (2) AFTER shipping_city

;

/* end of script                                                 */

你可以用如下命令运行上述脚本:

mysql> \. c:\mysql\scripts\shipping_address.sql

你的msql控制台上将打印出如下信息:

Database changed

Query OK, 9 rows affected (0.73 sec)

Records: 9  Duplicates: 0  Warnings: 0

 

Query OK, 8 rows affected (0.49 sec)

Records: 8  Duplicates: 0  Warnings: 0

为了确保新的字段已经加入,查询customer_dim表:

mysql> select * from customer_dim \G

下面是查询结果:

*************************** 1. row ***************************

            customer_sk: 1

        customer_number: 1

          customer_name: Really Large Customers

customer_street_address: 7500 Louise Dr.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 2. row ***************************

            customer_sk: 2

        customer_number: 2

          customer_name: Small Stores

customer_street_address: 2500 Woodland St.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 3. row ***************************

            customer_sk: 3

        customer_number: 3

          customer_name: Medium Retailers

customer_street_address: 1111 Ritter Rd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 4. row ***************************

            customer_sk: 4

        customer_number: 4

          customer_name: Good Companies

customer_street_address: 9500 Scott St.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 5. row ***************************

            customer_sk: 5

        customer_number: 5

          customer_name: Wonderful Shops

customer_street_address: 3333 Rossmoyne Rd.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 6. row ***************************

            customer_sk: 6

        customer_number: 6

          customer_name: Extremely Loyal Clients

customer_street_address: 7070 Ritter Rd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective date: 2005–03–01

            expiry_date: 2007–02–28

*************************** 7. row ***************************

            customer_sk: 7

        customer_number: 7

          customer_name: Distinguished Agencies

customer_street_address: 9999 Scott St.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective date: 2005–03–01

            expiry_date: 9999–12–31

*************************** 8. row ***************************

            customer_sk: 8

        customer_number: 6

          customer_name: Extremely Loyal Clients

customer_street_address: 7777 Ritter Rd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2007–03–01

            expiry_date: 9999–12–31

*************************** 9. row ***************************

            customer_sk: 9

        customer_number: 8

          customer_name: Subsidiaries

customer_street_address: 10000 Wetline Blvd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

       shipping_address: NULL

      shipping_zip_code: NULL

          shipping_city: NULL

         shipping_state: NULL

         effective_date: 2007–03–01

            expiry_date: 9999–12–31

9 rows in set (0.00 sec)

       :新的字段还没有装载数据,所有它们的值都是空(NULL

列表10-2中的order_quantity.sql脚本向sales_prder_fact表增加order_quantity字段。

列表10-2:添加order_quantity字段

/*****************************************************************/

/*                                                               */

/* order_quantity.sql                                            */

/*                                                               */

/*****************************************************************/

 

USE dw;

 

ALTER TABLE sales_order_fact

ADD order_quantity INT AFTER order_amount

 

;

 

/* end of script                                                 */

你可以用如下的命令运行上述脚本:

mysql> \. c:\mysql\scripts\order_quantity.sql

回车后你在控制台上可以看到:

Database changed

Query OK, 35 rows affected (0.56 sec)

Records: 35  Duplicates: 0  Warnings: 0

为了确保新的字段被加入,查询sales_order_fact

mysql> select order_sk osk, customer_sk csk, product_sk psk,

       order_date_sk odsk,

    -> order_amount amt, order_quantity qty

    -> from sales_order_fact;

结果如下:

+-----+-----+-----+------+---------+------+

| osk | csk | psk | odsk | amt     | qty  |

+-----+-----+-----+------+---------+------+

|   1 |   3 |   3 |    1 | 4000.00 | NULL |

|   2 |   4 |   1 |   46 | 4000.00 | NULL |

|   3 |   5 |   2 |   81 | 6000.00 | NULL |

|   4 |   6 |   3 |  152 | 6000.00 | NULL |

|   5 |   7 |   1 |  185 | 8000.00 | NULL |

|   6 |   1 |   2 |  255 | 8000.00 | NULL |

|   7 |   2 |   3 |  311 | 1000.00 | NULL |

|   8 |   3 |   1 |  347 | 1000.00 | NULL |

|   9 |   4 |   2 |  380 | 2000.00 | NULL |

|  10 |   5 |   3 |  416 | 2500.00 | NULL |

|  11 |   6 |   1 |  456 | 3000.00 | NULL |

|  12 |   7 |   2 |  458 | 3500.00 | NULL |

|  13 |   1 |   3 |  502 | 4000.00 | NULL |

|  14 |   2 |   1 |  548 | 4500.00 | NULL |

|  15 |   3 |   2 |  554 | 1000.00 | NULL |

|  16 |   4 |   3 |  584 | 1000.00 | NULL |

|  17 |   5 |   1 |  681 | 4000.00 | NULL |

|  18 |   6 |   2 |  722 | 4000.00 | NULL |

|  19 |   7 |   3 |  730 | 4000.00 | NULL |

|  20 |   1 |   1 |  731 | 1000.00 | NULL |

|  21 |   2 |   2 |  731 | 2000.00 | NULL |

|  22 |   3 |   4 |  731 | 3000.00 | NULL |

|  23 |   4 |   5 |  731 | 4000.00 | NULL |

|  24 |   5 |   2 |  731 | 1000.00 | NULL |

|  25 |   8 |   2 |  731 | 3000.00 | NULL |

|  26 |   7 |   4 |  731 | 5000.00 | NULL |

|  27 |   9 |   5 |  731 | 7000.00 | NULL |

|  28 |   1 |   1 |  731 | 1000.00 | NULL |

|  29 |   2 |   2 |  731 | 2000.00 | NULL |

|  30 |   3 |   4 |  731 | 4000.00 | NULL |

|  31 |   4 |   5 |  731 | 6000.00 | NULL |

|  32 |   5 |   1 |  731 | 2500.00 | NULL |

|  33 |   8 |   2 |  731 | 5000.00 | NULL |

|  34 |   7 |   4 |  731 | 7500.00 | NULL |

|  35 |   9 |   5 |  731 | 1000.00 | NULL |

+-----+-----+-----+------+---------+------+

35 rows in set (0.00 sec)

 

修正定期装载脚本

       当你修改了数据库模式后,你需要修正你一直用于定期装载的脚本。列表10-3表示修正后的定期装载脚本。

       让我们假设源数据仅仅提供活跃客户的海运地址;客户表中非活跃客户的海运地址将是空的。因此你需要更新数据仓库中现有的用户信息,当他们的海运地址在源数据中有效的时候。如果用户还是希望你维护海运地址的历史信息,你将对海运地址字段应用SCD2

       我们还假设订单数量在销售订单源数据中是有效的,而且客户允许那些已经在数据仓库中的销售订单记录将不用更新。

列表10-3:修正dw的每日定期装载

/*****************************************************************/

/*                                                              */

/* dw_regular_10.sql                                            */

/*                                                              */

/*****************************************************************/

 

USE dw;

 

/* CUSTOMER_DIM POPULATION                                      */

 

TRUNCATE customer_stg;

 

LOAD DATA INFILE 'customer.csv'

INTO TABLE customer_stg

FIELDS TERMINATED BY ' , '

OPTIONALLY ENCLOSED BY ' " '

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state )

;

 

/* SCD 2 ON ADDRESSES                                            */

 

UPDATE

  customer_dim a

, customer_stg b

SET

  a.expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

  a.customer_number = b.customer_number

AND (   a.customer_street_address <> b.customer_street_address

     OR a.customer_city <> b.customer_city

     OR a.customer_zip_code <> b.customer_zip_code

     OR a.customer_state <> b.customer_state

     OR a.shipping_address <> b.shipping_address

     OR a.shipping_city <> b.shipping_city

     OR a.shipping_zip_code <> b.shipping_zip_code

     OR a.shipping_state <> b.shipping_state

     OR a.shipping_address IS NULL

     OR a.shipping_city IS NULL

     OR a.shipping_zip_code IS NULL

     OR a.shipping_state IS NULL)

AND expiry_date = '9999-12-31'

;

 

INSERT INTO customer_dim

SELECT

  NULL

, b.customer_number

, b.customer_name

, b.customer_street_address

, b.customer_zip_code

, b.customer_city

, b.customer_state

, b.shipping_address

, b.shipping_zip_code

, b.shipping_city

, b.shipping_state

, CURRENT_DATE

, '9999-12-31'

FROM

  customer_dim a

, customer_stg b

WHERE

    a.customer_number = b.customer_number

AND (   a.customer_street_address <> b.customer_street_address

     OR a.customer_city <> b.customer_city

     OR a.customer_zip_code <> b.customer_zip_code

     OR a.customer_state <> b.customer_state

     OR a.shipping_address <> b.shipping_address

     OR a.shipping_city <> b.shipping_city

     OR a.shipping_zip_code <> b.shipping_zip_code

     OR a.shipping_state <> b.shipping_state

     OR a.shipping_address IS NULL

     OR a.shipping_city IS NULL

     OR a.shipping_zip_code IS NULL

     OR a.shipping_state IS NULL)

AND EXISTS (

SELECT *

FROM customer_dim x

WHERE b.customer_number = x.customer_number

AND a.expiry_date = SUBDATE (CURRENT_DATE, 1))

AND NOT EXISTS (

SELECT *

FROM customer_dim y

WHERE     b.customer_number = y.customer_number

      AND y.expiry_date = '9999–12–31')

;

 

/* END OF SCD 2                                                  */

 

/* SCD 1 ON NAME                                                 */

 

UPDATE customer_dim a, customer_stg b

SET a.customer_name = b.customer_name

WHERE     a.customer_number = b.customer_number

      AND a.expiry_date = '9999–12–31'

      AND a.customer_name <> b.customer_name

;

 

/* ADD NEW CUSTOMER                                              */

 

INSERT INTO customer dim

SELECT

  NULL

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, CURRENT_DATE

, '9999–12–31'

FROM customer_stg

WHERE customer_number NOT IN(

SELECT a.customer_number

FROM

  customer_dim a

, customer_stg b

WHERE b.customer_number = a.customer_number )

;

 

/* END OF CUSTOMER_DIM POPULATION                                */

 

/* product dimension loading                                     */

 

TRUNCATE product_stg

;

 

LOAD DATA INFILE 'product.txt'

INTO TABLE product_stg

FIELDS TERMINATED BY ' '

OPTIONALLY ENCLOSED BY ' '

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( product_code

, product_name

, product_category )

;

 

/* PRODUCT_DIM POPULATION                                        */

 

/* SCD2 ON PRODUCT NAME AND GROUP                                */

UPDATE

  product_dim a

, product_stg b

SET

  expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

    a.product_code = b.product_code

AND ( a.product_name <> b.product_name

   OR a.product_category <> b.product_category

   AND expiry_date = '9999–12–31'

;

 

INSERT INTO product_dim

SELECT

  NULL

, b.product_code

, b.product_name

, b.product_category

, CURRENT_DATE

, '9999–12–31'

FROM

  product_dim a

, product_stg b

WHERE

    a.product_code = b.product_code

AND ( a.product_name <> b.product_name

   OR a.product_category <> b.product_category )

AND EXISTS (

SELECT *

FROM product_dim x

WHERE     b.product_code = x.product_code

             AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)

AND NOT EXISTS (

SELECT *

FROM product_dim y

WHERE    b.product_code = y.product_code

             AND y.expiry_date = '9999–12–31')

;

 

/* END OF SCD 2                                                  */

 

/* ADD NEW PRODUCT                                               */

 

INSERT INTO product_dim

SELECT

  NULL

, product_code

, product_name

, product_category

, CURRENT_DATE

, '9999–12–31'

FROM product_stg

WHERE product_code NOT IN(

SELECT y.product_code

FROM product_dim x, product_stg y

WHERE x.product_code = y.product_code )

;

 

/* END OF PRODUCT_DIM POPULATION                                 */

 

/* ORDER_DIM POPULATION                                          */

 

INSERT INTO order_dim (

  order_sk

, order_number

, effective_date

, expiry_date

)

SELECT

  NULL

, order_number

, order_date

, '9999–12–31'

FROM source.sales_order

WHERE entry_date = CURRENT_DATE

;

 

/* END OF ORDER_DIM POPULATION                                  */

 

/* SALES_ORDER_FACT POPULATION                                  */

 

INSERT INTO sales_order_fact

SELECT

  order_sk

, customer_sk

, product_sk

, date_sk

, order_amount

, order_quantity

FROM

  source.sales order a

, order_dim b

, customer_dim c

, product_dim d

, date_dim e

WHERE

    a.order_number = b.order_number

AND a.customer_number = c.customer_number

AND a.order_date >= c.effective_date

AND a.order_date <= c.expiry_date

AND a.product_code = d.product_code

AND a.order_date >= d.effective_date

AND a.order_date <= d.expiry_date

AND a.order_date = e.date

AND a.entry_date = CURRENT_DATE

;

 

/* end of script                                                */

 

       下一节将对如何准备必需的数据以使得列表10-3的脚本可以成功运行进行说明。
阅读(1748) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~