Chinaunix首页 | 论坛 | 博客
  • 博客访问: 226210
  • 博文数量: 31
  • 博客积分: 1427
  • 博客等级: 上尉
  • 技术积分: 409
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-04 15:21
文章分类

全部博文(31)

文章存档

2008年(31)

我的朋友

分类: Mysql/postgreSQL

2008-03-22 14:33:48

第二章 维的历史记录

 

       存储在维表中得数值叫维成员。比如第1章的product_dim维表就存储关于产品的维成员。

       很多维成员会随着时间而变化。客户改变地址,产品更名和重新规类,销售订单修正,等等。当一个维成员发生改变时,比如产品建立一个新的类别,你将必须维护维成员的历史记录。例如:一旦发生产品的重新归类,你必须维护该产品维的历史记录,在这个例子中,你必须在product_dim维表中存储该产品的以前的类别和现在新类别信息。另外,以往的订单记录中,产品类别将保持对应的旧类别。

       缓慢变化维(SCD)是在维度数据仓库中记录维历史的技术。本章将教你如何用SCD技术维护维度的历史记录。你将学习不同的SCD脚本,并且用实验进行验证这些脚本是如何正确地维护维历史记录的。

 

缓慢变化维技术

       有三种不同的处理缓慢变化维的技术:第一类缓慢变化维(SCD1),第二类缓慢变化维(SCD2),第三类缓慢变化维(SCD3)SCD1更新维成员记录而不存储维成员的历史记录。SCD1用于直接校正错误的维成员。

       SCD2维护维历史记录,当一个维成员改变时,将产生一个新版本的维成员。SCD2不删除或者改变现有的维成员记录。

       SCD3只是保存一个版本的维成员记录。它使用多个字段来保存不同维成员值的方法来维护维的历史记录。比如记录客户地址,customer_dim维表就必须有customer_address字段和previous_customer_address字段来分别存储前后两次的地址信息。相比较SCD2维护全部历史记录而言,SCD3只能维护有限的历史记录。SCD3很少被应用。它只能用在那种对数据库空间有限制的场合,并且数据仓库的使用者可以接受有限的维成员历史记录。

       注意 本书只涉及SCD1 SCD2

 

第一类缓慢变化维技术(SCD1)

       如果你不需要维护维度变化历史,你可以用SCD1。当源数据改变,你更新相应维表中现有的数据记录。

       可以举例子:在第1章中,我就应用SCD1customer_dim表进行建表和载入数据。回顾一下,该表有表2-1所示的6条记录。

 

表格2-1 未发生改变的customer_dim table

customer _number

customer _name

customer _street _address

customer _zip _code

customer _city

customer _state

1

Big Customers

7500 Louise Dr.

17050

Mechanicsburg

PA

2

Small Stores

2500 Woodland St.

17055

Pittsburgh

PA

3

Medium Retailers

1111 Ritter Rd.

17055

Pittsburgh

PA

4

Good Companies

9500 Scott St

17050

Mechanicsburg

PA

5

Wonderful Shops

3333 Rossmoyne Rd.

17050

Mechanicsburg

PA

6

Loyal Clients

7070 Ritter Rd.

17055

Pittsburgh

PA

 

假设客户详细信息发生改变,现在源数据中,客户表的内容如表2-2所示:

2-2 修正后的客户信息

customer _number

customer _name

customer _street _address

customer _zip _code

customer _city

customer _state

1

Really Large Customers

7500 Louise Dr.

17050

Mechanicsburg

PA

2

Small Stores

2500 Woodland St.

17055

Pittsburgh

PA

3

Medium Retailers

1111 Ritter Rd.

17055

Pittsburgh

PA

4

Good Companies

9500 Scott St.

17050

Mechanicsburg

PA

5

Wonderful Shops

3333 Rossmoyne Rd.

17050

Mechanicsburg

PA

6

Loyal Clients

7070 Ritter Rd.

17055

Pittsburgh

PA

7

Distinguished Partners

9999 Scott St.

17050

Mechanicsburg

PA

 

正如你所看到的,第一个客户的名称发生变化,并且多了一个编号是“7”的客户记录。

       你可以运行列表2-1中所示的脚本将SCD1应用于数据仓库中的customer_dim表中,这里假设新增的客户信息已经被载入到中间(临时)表customer_stg

       列表2-1:在表customer_dim中对客户名称应用SCD1

/ *****************************************************************/

/*                                                                */

/* scd1.sql                                                       */

/*                                                                */

/ *****************************************************************/

 
/* default database to dw                                         */
USE dw;
 
/* update existing customers                                      */
 
UPDATE customer_dim a, customer_stg b
SET a.customer_name = b.customer_name
WHERE a.customer_number = b.customer_number
  AND a.expiry_date = '9999-12-31'
  AND a.customer_name <> b.customer_name
;
 
/* add new customers                                               */
 
INSERT INTO customer_dim
SELECT
  NULL
, customer_number
, customer_name
, customer_street_address
, customer_zip_code
, customer_city
, customer_state
, CURRENT_DATE
, '9999-12-31'
FROM customer_stg
WHERE customer_number NOT IN (
  SELECT b.customer_number
  FROM customer_dim a, customer_stg b
  WHERE a.customer_number = b.customer_number )
;
/* end of script                                                   */
 

列表2-1中的脚本包含两个SQL语句,一个更新语句和一个插入语句。更新语句将临时表的customer_name字段复制到customer_dim表的customer_name字段。插入语句则将临时表中有而customer_dim表中没有的记录插入到customer_dim表中。运行该脚本将在customer_dim表中更新临时表中的第一个客户的名称并插入临时表中第七个客户的信息。

尽管在你运行这个脚本文件之前我们还没有临时表,但是在customer.csv文件中我们有当前客户资料表(source数据库中)的内容。因此,在customer_dim表中应用SCD1将需要两个步骤:

1.         建立临时表customer_stg,并将customer.csv文件中的记录载入该表。

2.         运行列表2-1中所示的的脚本。

这些步骤将在下面小节中说明。

 

建立、载入客户临时表

 

customer.csv文件包含当前的客户资料信息,可以在本书附带的Zip文件中找到它。其内容如下所示:

CUSTOMER NO, CUSTOMER NAME,STREET ADDRESS, ZIP CODE,CITY,STATE

1,Really Large Customers, 7500 Louise Dr., 17050, Mechanicsburg, PA

2,Small Stores, 2500 Woodland St., 17055, Pittsburgh, PA

3,Medium Retailers, 1111 Ritter Rd., 17055, Pittsburgh, PA

4,Good Companies, 9500 Scott St., 17050, Mechanicsburg, PA

5,Wonderful Shops, 3333 Rossmoyne Rd., 17050, Mechanicsburg, PA

6,Loyal Clients, 7070 Ritter Rd., 17055, Pittsburgh, PA

7,Distinguished Partners, 9999 Scott St., 17050, Mechanicsburg, PA

 

列表2-2所示的脚本在数据仓库(dw)中建立customer_stg ,并把customer.csv文件的信息载入表中。

 

列表 2-2: 建立并载入 customer_stg

/****************************************************************/

/*                                                              */

/* create_customer_stg.sql                                      */

/*                                                              */

/****************************************************************/

 

/* default database to dw                                       */

 

USE dw;

 

/* create customer_stg table                                    */

 

CREATE TABLE customer_stg

(customer_number INT

customer_name CHAR (30)

customer_street_address CHAR (30)

customer_zip_code INT (5)

customer_city CHAR (30)

customer_state CHAR (2) )

;

 

/* clean up customer_stg table and load customer.csv            */

 

TRUNCATE customer_stg;

 

LOAD DATA INFILE 'customer.csv'

INTO TABLE customer_stg

FIELDS TERMINATED BY ','

OPTIONALLY ENCLOSED BY ""

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state )

;

 

/* end of script                                                */

 

用以下命令运行列表2-2脚本:

mysql> \. c:\mysql\scripts\create_customer_stg.sql

 

应用SCD1

        现在你可以开始准备运行列表2-1中的SCD1脚本,在此之前,你需要将mysql数据库时间调整到2007-02-02(一个比你在第1章所设置的日期更后面的日期)。设置完日期后就可以运行该scd1.sql脚本了。

       mysql> \. c:\mysql\scripts\scd1.sql

       现在可以查询customer_dim表来确认列表2-1 是否以及成功运行了。

mysql> select * from customer_dim \G

 

你将会看到如下结果:

*************************** 1. row ***************************

            customer_sk: 1

        customer_number: 1

          customer_name: Really Large Customers

customer_street_address: 7500 Louise Dr.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 2. row ***************************

            customer_sk: 2

        customer_number: 2

          customer_name: Small Stores

customer_street_address: 2500 Woodland St.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 3. row ***************************

            customer_sk: 3

        customer_number: 3

          customer_name: Medium Retailers

customer_street_address: 1111 Ritter Rd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 4. row ***************************

            customer_sk: 4

        customer_number: 4

          customer_name: Good Companies

customer_street_address: 9500 Scott St.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 5. row ***************************

            customer_sk: 5

        customer_number: 5

          customer_name: Wonderful Shops

customer_street_address: 3333 Rossmoyne Rd.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 6. row ***************************

            customer_sk: 6

        customer_number: 6

          customer_name: Loyal Clients

customer_street_address: 7070 Ritter Rd.

      customer_zip_code: 17055

          customer_city: Pittsburgh

         customer_state: PA

         effective_date: 2007-02-01

            expiry_date: 9999-12-31

*************************** 7. row ***************************

            customer_sk: 7

        customer_number: 7

          customer_name: Distinguished Partners

customer_street_address: 9999 Scott St.

      customer_zip_code: 17050

          customer_city: Mechanicsburg

         customer_state: PA

         effective_date: 2007-02-02

            expiry_date: 9999-12-31

7 rows in set (0.00 sec)

 

结果分析

上述查询的显示表明:

l         第一个客户的名称已经改为‘Really Large Customers’,这里只有第一个客户的名称,也就是说没有为该客户保留历史数据。前面六个客户的生效、失效日期并没有改变,只有第一个客户的名称改变了。

l         7个客户的信息已经加入表中,它的生效日期是2007-02-02(你运行该脚本的日期)。

阅读(3546) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~