Chinaunix首页 | 论坛 | 博客
  • 博客访问: 223701
  • 博文数量: 31
  • 博客积分: 1427
  • 博客等级: 上尉
  • 技术积分: 409
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-04 15:21
文章分类

全部博文(31)

文章存档

2008年(31)

我的朋友

分类: Mysql/postgreSQL

2008-10-23 15:25:02

宾夕法尼亚州地区客户维

   在本节我将用宾夕法尼亚州地区客户的子集维度来解释第二种维度子集的类型。我也将向你说明如何测试该子集维度。

    相对的,一个向上钻取的维包含了它基础维的所有更高级别的数据。而一个特定子集维度则选择了它基础维的某个特定的数据集合。列表12-3所示的脚本产生并加载了宾夕法尼亚州(PA)地区客户子集维。

    注意到,有两个事情是宾夕法尼亚州地区客户子集维区别于月份子集维的地方。

n        pa_customer_dim表和customer_dim表的字段结构一样,而month_dim表没有 date_dim表中的日期字段。

n        pa_customer_dim表的代理键是客户表的代理键。Month_dim表的代理键只属于month_dim表,并不是来自日期维表。

 

列表12-3 PA的客户:

/**********************************************************************/

/*                                                                  */

/* pa_customer.sql                                                  */

/*                                                                  */

/**********************************************************************/

 

USE dw;

 

CREATE TABLE pa_customer_dim

( customer_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY

, customer_number INT

, customer_name CHAR (50)

, customer_street_address CHAR (50)

, customer_zip_code INT (5)

, customer_city CHAR (30)

, customer_state CHAR (2)

, shipping_address CHAR (50)

, shipping_zip_code INT (5)

, shipping_city CHAR (30)

, shipping_state CHAR (2)

, effective_date DATE

, expiry_date DATE )

;

 

INSERT INTO pa_customer_dim

SELECT

  customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date

FROM customer_dim

WHERE customer_state = 'PA'

;

 

/* end of script                                                    */

 

    为了测试PA子维脚本,你需要先用列表12-4的脚本来增加三个居住于俄亥俄州的客户。

 

列表12-4:非PA 客户:

/**********************************************************************/

/*                                                                  */

/* non_pa_customer.sql                                              */

/*                                                                  */

/**********************************************************************/

 

/* default to dw                                                    */

 

USE dw;

 

INSERT INTO customer_dim

( customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date )

VALUES

  (NULL, 10, 'Bigger Customers', '7777 Ridge Rd.', '44102',

       'Cleveland', 'OH', '7777 Ridge Rd.', '44102', 'Cleveland',

       'OH', CURRENT_DATE, '9999-12-31')

, (NULL, 11, 'Smaller Stores', '8888 Jennings Fwy.', '44102',

       'Cleveland', 'OH', '8888 Jennings Fwy.', '44102',

       'Cleveland', 'OH', CURRENT_DATE, '9999-12-31')

, (NULL, 12, 'Small-Medium Retailers', '9999 Memphis Ave.', '44102',

       'Cleveland', 'OH', '9999 Memphis Ave.', '44102', 'Cleveland',

       'OH', CURRENT_DATE, '9999-12-31')

;

/* end of script                                                    */

 

用以下命令运行列表12-4中的脚本。

mysql> \. c:\mysql\scripts\non_pa_customer.sql

在你的控制台上将得到下面的响应信息:

Database changed

Query OK, 3 rows affected (0.86 sec)

Records: 3  Duplicates: 0  Warnings: 0

 

    现在你已经准备好运行列表12-3中的pa_customer.sql脚本,在你做这些之前,确定你的Msql数据库的日期仍然是2007-03-02

    你可以用如下命令方式运行pa_customer.sql脚本。

mysql> \. c:\mysql\scripts\pa_customer.sql

    你将在控制台上看到:

Database changed

Query OK, 0 rows affected (0.20 sec)

 

Query OK, 18 rows affected (0.08 sec)

Records: 18 Duplicates: 0 Warnings: 0

 

为了确保三个OH客户已经成功的载入,查询customer_dim 表:

mysql> select customer_name, customer_state, effective_date from customer_dim;

 

控制台上将看到:

+---------------------------------+----------------+---------------+

| customer_name                   | customer_state | effective_date|

+---------------------------------+----------------+---------------+

| Really Large Customers          | PA             | 2005-03-01    |

| Small Stores                    | PA             | 2005-03-01    |

| Medium Retailers                | PA             | 2005-03-01    |

| Good Companies                  | PA             | 2005-03-01    |

| Wonderful Shops                 | PA             | 2005-03-01    |

| Extremely Loyal Clients         | PA             | 2005-03-01    |

| Distinguished Agencies          | PA             | 2005-03-01    |

| Extremely Loyal Clients         | PA             | 2007-03-01    |

| Subsidiaries                    | PA             | 2007-03-01    |

| Really Large Customers          | PA             | 2007-03-02    |

| Small Stores                    | PA             | 2007-03-02    |

| Medium Retailers                | PA             | 2007-03-02    |

| Good Companies                  | PA             | 2007-03-02    |

| Wonderful Shops                 | PA             | 2007-03-02    |

| Extremely Loyal Clients         | PA             | 2007-03-02    |

| Distinguished Agencies          | PA             | 2007-03-02    |

| Subsidiaries                    | PA             | 2007-03-02    |

| Online Distributors             | PA             | 2007-03-02    |

| Bigger Customers                | OH             | 2007-03-02    |

| Smaller Stores                  | OH             | 2007-03-02    |

| Small-Medium Retailers          | OH             | 2007-03-02    |

+---------------------------------+----------------+---------------+

21 rows in set (0.00 sec)

 

现在查询pa_customer_dim表来确定只有PA的客户在PA客户维表中。

mysql> select customer_name, customer_state, effective_date from pa_customer_dim;

 

结果如下:


+---------------------------------+----------------+---------------+

| customer_name                   | customer_state | effective_date|

+---------------------------------+----------------+---------------+

| Really Large Customers          | PA             | 2004-01-01    |

| Small Stores                    | PA             | 2004-01-01    |

| Medium Retailers                | PA             | 2004-01-01    |

| Good Companies                  | PA             | 2004-01-01    |

| Wonderful Shops                 | PA             | 2004-01-01    |

| Extremely Loyal Clients         | PA             | 2004-01-01    |

| Distinguished Agencies          | PA             | 2004-01-01    |

| Extremely Loyal Clients         | PA             | 2005-11-01    |

| Subsidiaries                    | PA             | 2005-11-01    |

| Really Large Customers          | PA             | 2005-11-03    |

| Small Stores                    | PA             | 2005-11-03    |

| Medium Retailers                | PA             | 2005-11-03    |

| Good Companies                  | PA             | 2005-11-03    |

| Wonderful Shops                 | PA             | 2005-11-03    |

| Extremely Loyal Clients         | PA             | 2005-11-03    |

| Distinguished Agencies          | PA             | 2005-11-03    |

| Subsidiaries                    | PA             | 2005-11-03    |

| Online Distributors             | PA             | 2005-11-03    |

+---------------------------------+----------------+---------------+

18 rows in set (0.00 sec)

    正如你所看到的,只有PA客户进入到该表中,之前加入的OH客户并没有在其中。

 

修改定期装载

   当一个新的PA客户资料加入到客户维中时,为了能够同时加入PA客户维表,你需要把PA客户子集维的装载合并到数据仓库的定期装载过程中。   修改后的定期装载脚本将在列表12-5中列出。这个改变(新增)用粗体显示。注意到,当每次你运行每日的定期装载脚本时,该脚本重建(截断,然后加入所有的PA客户)了PA客户子集维。

 

列表12-5:修正后的每日DW定期装载

/******************************************************************/

/*                                                              */

/* dw_regular_12.sql                                            */

/*                                                              */

/******************************************************************/

 

USE dw;

 

/* CUSTOMER_DIM POPULATION                                      */

 

TRUNCATE customer_stg;

 

LOAD DATA INFILE 'customer.csv'

INTO TABLE customer_stg

FIELDS TERMINATED BY ' , '

OPTIONALLY ENCLOSED BY '"'

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state )

;

 

/* SCD 2 ON ADDRESSES                                           */

 

UPDATE

  customer_dim a

, customer_stg b

SET

  a.expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

  a.customer_number = b.customer_number

AND (   a.customer_street_address <> b.customer_street_address

     OR a.customer_city <> b.customer_city

     OR a.customer_zip_code <> b.customer_zip_code

     OR a.customer_state <> b.customer_state

     OR a.shipping_address <> b.shipping_address

     OR a.shipping_city <> b.shipping_city

     OR a.shipping_zip_code <> b.shipping_zip_code

     OR a.shipping_state <> b.shipping_state

     OR a.shipping_address IS NULL

     OR a.shipping_city IS NULL

     OR a.shipping_zip_code IS NULL

     OR a.shipping_state IS NULL)

AND expiry_date = '9999-12-31'

;

 

INSERT INTO customer_dim

SELECT

  NULL

, b.customer_number

, b.customer_name

, b.customer_street_address

, b.customer_zip_code

, b.customer_city

, b.customer_state

, b.shipping_address

, b.shipping_zip_code

, b.shipping_city

, b.shipping_state

, CURRENT_DATE

, '9999-12-31'

FROM

  customer_dim a

, customer_stg b

WHERE

    a.customer_number = b.customer_number

AND (   a.customer_street_address <> b.customer_street_address

     OR a.customer_city <> b.customer_city

     OR a.customer_zip_code <> b.customer_zip_code

     OR a.customer_state <> b.customer_state

     OR a.shipping_address <> b.shipping_address

     OR a.shipping_city <> b.shipping_city

     OR a.shipping_zip_code <> b.shipping_zip_code

     OR a.shipping_state <> b.shipping_state

     OR a.shipping_address IS NULL

     OR a.shipping_city IS NULL

     OR a.shipping_zip_code IS NULL

     OR a.shipping_state IS NULL)

AND EXISTS (

SELECT *

FROM customer_dim x

WHERE b.customer_number = x.customer_number

AND a.expiry_date - SUBDATE (CURRENT_DATE, 1))

AND NOT EXISTS (

SELECT *

FROM customer_dim y

WHERE     b.customer_number = y.customer_number

      AND y.expiry_date - '9999-12-31')

;

 

/* END OF SCD 2                                                 */

 

/* SCD 1 ON NAME                                                */

 

UPDATE customer_dim a, customer_stg b

SET a.customer_name = b.customer_name

WHERE     a.customer_number = b.customer_number

      AND a.expiry_date - '9999-12-31'

      AND a.customer_name <> b.customer_name

;

 

/* ADD NEW CUSTOMER                                             */

 

INSERT INTO customer_dim

SELECT

  NULL

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, CURRENT_DATE

, '9999-12-31'

FROM customer_stg

WHERE customer_number NOT IN(

SELECT a.customer_number

FROM

  customer_dim a

, customer_stg b

WHERE b.customer_number = a.customer_number )

/* RE-BUILD PA CUSTOMER DIMENSION                               */

 

TRUNCATE pa_customer_dim;

 

INSERT INTO pa_customer_dim

SELECT

  customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date

FROM customer_dim

WHERE customer_state = 'PA'

;

/* END OF CUSTOMER_DIM POPULATION                               */

 

/* product dimension loading                                    */

 

TRUNCATE product_stg

;

 

LOAD DATA INFILE 'product.txt'

INTO TABLE product_stg

FIELDS TERMINATED BY ''

OPTIONALLY ENCLOSED BY ''

LINES TERMINATED BY '\r\n'

IGNORE 1 LINES

( product_code

, product_name

, product_category )

;

 

/* PRODUCT_DIM POPULATION                                       */

 

/* SCD2 ON PRODUCT NAME AND GROUP                               */

 

UPDATE

  product_dim a

, product_stg b

SET

  expiry_date = SUBDATE (CURRENT_DATE, 1)

WHERE

    a.product_code = b.product_code

AND (   a.product_name <> b.product_name

     OR a.product_category <> b.product_category )

     AND expiry_date = '9999-12-31'

;

 

INSERT INTO product_dim

SELECT

  NULL

, b.product_code

, b.product_name

, b.product_category

, CURRENT_DATE

, '9999-12-31'

FROM

  product_dim a

, product_stg b

WHERE

    a.product_code = b.product_code

AND (   a.product_name <> b.product_name

     OR a.product_category <> b.product_category )

AND EXISTS (

SELECT *

FROM product_dim x

WHERE b.product_code = x.product_code

  AND a.expiry_date = SUBDATE (CURRENT_DATE, 1) )

AND NOT EXISTS (

SELECT *

FROM product_dim y

WHERE b.product_code = y.product_code

  AND y.expiry_date = '9999-12-31')

;

 

/* END OF SCD 2                                                 */

 

/* ADD NEW PRODUCT                                              */

 

INSERT INTO product_dim

SELECT

  NULL

, product_code

, product_name

, product_category

, CURRENT_DATE

, '9999-12-31'

FROM product_stg

WHERE product_code NOT IN(

SELECT y.product_code

FROM product_dim x, product_stg y

WHERE x.product_code = y.product_code

;

 

/* END OF PRODUCT_DIM POPULATION                                */

 

/* ORDER_DIM POPULATION                                         */

 

INSERT INTO order_dim (

  order_sk

, order_number

, effective_date

, expiry_date

)

SELECT

  NULL

, order_number

, order_date

, '9999-12-31'

FROM source.sales_order

WHERE entry_date = CURRENT_DATE

;

 

/* END OF ORDER_DIM POPULATION                                  */

 

/* SALES_ORDER_FACT POPULATION                                  */

 

INSERT INTO sales_order_fact

SELECT

  order_sk

, customer_sk

, product_sk

, date_sk

, order_amount

, order_quantity

FROM

  source.sales_order a

, order_dim b

, customer_dim c

, product_dim d

, date_dim e

WHERE

    a.order_number = b.order_number

AND a.customer_number = c.customer_number

AND a.order_date >= c.effective_date

AND a.order_date <= c.expiry_date

AND a.product_code = d.product_code

AND a.order_date >= d.effective_date

AND a.order_date <= d.expiry_date

AND a.order_date = e.date

AND a.entry_date = CURRENT_DATE

;

 

/* end of script                                                */

 

 

测试修正后的定期装载

    现在你可以测试列表12-5的脚本。在你实施之前,先增加一些客户数据,通过运行列表12-6的脚本增加一个PA客户和一个OH客户到客户维中。

 

列表12-6:增加两个客户

/******************************************************************/

/*                                                              */

/* two_more_customers.sql                                       */

/*                                                              */

/******************************************************************/

 

/* default to dw                                                */

 

USE dw;

 

INSERT INTO customer_dim

( customer_sk

, customer_number

, customer_name

, customer_street_address

, customer_zip_code

, customer_city

, customer_state

, shipping_address

, shipping_zip_code

, shipping_city

, shipping_state

, effective_date

, expiry_date )

VALUES

  (NULL, 13, 'PA Customer', '1111 Louise Dr.', '17050',

       'Mechanicsburg', 'PA', '1111 Louise Dr.', '17050',

       'Mechanicsburg', 'PA', CURRENT_DATE, '9999-12-31')

, (NULL, 14, 'OH Customer', '6666 Ridge Rd.', '44102',

       'Cleveland', 'OH', '6666 Ridge Rd.', '44102',

       'Cleveland', 'OH', CURRENT_DATE, '9999-12-31')

;

 

/* end of script                                                */

 

现在运行列表12-6中的脚本:

mysql> \. c:\mysql\scripts\two_more_customers.sql

 

Mysql将显示有两个记录生效。

Database changed

Query OK, 2 rows affected (0.06 sec)

Records: 2  Duplicates: 0  Warnings: 0

现在更改你的Msql数据库的日期为2007-03-03以保证老的数据不会重新载入,然后运行dw_regular_12.sql脚本。

mysql> \. c:\mysql\scripts\dw_regular_12.sql

你将看到你的控制台有如下显示:

Database changed

Query OK, 9 rows affected (0.15 sec)

 

Query OK, 9 rows affected (0.14 sec)

Records: 9  Deleted: 0  Skipped: 0  Warnings: 0

 

Query OK, 0 rows affected (0.05 sec)

Rows matched: 0  Changed: 0  Warnings: 0

 

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

Query OK, 0 rows affected (0.00 sec)

Rows matched: 0  Changed: 0  Warnings: 0

 

Query OK, 0 rows affected (0.00 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

Query OK, 18 rows affected (0.04 sec)

 

Query OK, 19 rows affected (0.06 sec)

Records: 19  Duplicates: 0  Warnings: 0

 

Query OK, 4 rows affected (0.09 sec)

 

Query OK, 4 rows affected (0.07 sec)

Records: 4  Deleted: 0  Skipped: 0  Warnings: 0

 

Query OK, 0 rows affected (0.06 sec)

Rows matched: 0  Changed: 0  Warnings: 0

 

Query OK, 0 rows affected (0.00 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

Query OK, 0 rows affected (0.00 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

Query OK, 0 rows affected (0.15 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

Query OK, 0 rows affected (0.17 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

现在用该语句查询pa_customer_dim表。你将看到新的PA客户被插入到表中。

mysql> select customer_name, customer_state, effective_date from pa_customer_dim;

这是结果:

+-------------------------+----------------+---------------+

| customer_name           | customer_state | effective_date|

+-------------------------+----------------+---------------+

| Really Large Customers  | PA             | 2005-03-01    |

| Small Stores            | PA             | 2005-03-01    |

| Medium Retailers        | PA             | 2005-03-01    |

| Good Companies          | PA             | 2005-03-01    |

| Wonderful Shops         | PA             | 2005-03-01    |

| Extremely Loyal Clients | PA             | 2005-03-01    |

| Distinguished Agencies  | PA             | 2005-03-01    |

| Extremely Loyal Clients | PA             | 2007-03-01    |

| Subsidiaries            | PA             | 2007-03-01    |

| Really Large Customers  | PA             | 2007-03-02    |

| Small Stores            | PA             | 2007-03-02    |

| Medium Retailers        | PA             | 2007-03-02    |

| Good Companies          | PA             | 2007-03-02    |

| Wonderful Shops         | PA             | 2007-03-02    |

| Extremely Loyal Clients | PA             | 2007-03-02    |

| Distinguished Agencies  | PA             | 2007-03-02    |

| Subsidiaries            | PA             | 2007-03-02    |

| Online Distributors     | PA             | 2007-03-02    |

| PA Customer             | PA             | 2007-03-03    |

+-------------------------+----------------+---------------+

19 rows in set (0.00 sec)

 

小结

   在这章,你学习了两种类型的子集维。月份子集维是向上钻取维的一个例子,一个装载自比它更为详细的基础维的具有更高级的维。PA客户维是一个特定的子集维;只从它的基础维选择PA客户数据进行装载。

    下一章,你将学习另外一个复用现有维的技术,叫角色扮演维度。

阅读(1758) | 评论(0) | 转发(0) |
0

上一篇:mysql多维数据仓库指南--第三篇第12章(1)

下一篇:没有了

给主人留下些什么吧!~~