SQL行列转换实战(二)-a1234567mdy-ChinaUnix博客

生命不息，奋斗不止。。a1234567mdy.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

a1234567mdy

博客访问： 1419373
博文数量： 416
博客积分： 13005
博客等级：上将
技术积分： 3297
用户组：普通用户
注册时间： 2006-04-05 16:26

文章分类

全部博文（416）

服务器管理（1）
Reversing（1）
电脑安装升级（8）

dell v260sr（1）

P1（0）

IBM R52（0）

TCL K42（3）

IBM X200（1）

IBM SL400（1）
娱乐（2）

影视下载（2）
botnet动态（76）
virus log（11）
流水账（12）
程序（11）

PHP（1）

Perl（1）

API（4）

C/C++（4）

JSP（1）
硬件（2）
OFFICE（7）
汇编（74）

工具调试（4）

加解密（8）

Windows编程（13）

自学（10）

简明x86汇编语言（6）
打印机（1）
电脑（54）

网络（4）

虚拟机（2）

病毒（2）
PB（4）

函数编写（1）
Oracle（59）

动态SQL（3）

分析函数（2）

树结构（3）

学习PL/SQL中....（3）
随笔（87）
未分配的博文（6）

文章存档

2014年（1）

2013年（4）

2012年（46）

2011年（64）

2010年（12）

2009年（4）

2008年（40）

2007年（187）

2006年（58）

我的朋友

相关博文

SQL行列转换实战(二)

分类：数据库开发技术

2007-03-27 09:42:47

这里有篇行列转换的文章，大家共享一下
[b]
Line-Column Transposition
-------------------------
Introduction
------------
This question was asked once in an Oracle forum on Usenet, and received a very
evasive answer. Previously, I had to solve a similar issue while working for
another company; while the solution worked, it didn't look very convincing, and
it was hard to maintain. Later, I was asked this same question at a job interview.
Finally, we received such an inquiry from a customer at Support too. The proposed
solution was a PL/SQL function very specific to that customer's tables. Recently,
another customer admitted she had to face such an issue too. All this to say that
anybody is likely to encounter this problem at least once in a SQL programmer's
life. That is why I decided to tackle this issue once and for all and to provide
a portable and generic solution for it. Actually, I propose 3 solutions amidst
many other alternatives. Busy people please jump directly to the paragraph Usage
at the end of this article.

When a relational database is queried, the returned data is naturally presented
as a suite of line feed-terminated rows composed of several attribute values
aligned in columns. This tabular form closely matches the underlying relational
model and is therefore easily understood. This kind of representation also looks
like a matrix because it is a two-dimensional, rectangular array of data too.
This naturally leads to think of certain relational operations as matrix ones.
For example, a relational projection is similar to a matrix post-product and a
relational data selection to a matrix transposition followed by a matrix post-product
followed by a matrix transposition or, shorter, a matrix pre-product.
But, conversely, what relational operation does correspond to a matrix transposition
in which lines are converted to rows and rows to lines ?
There is no native relational operator that does this, so this functionality must
be programmed ad hoc. But firstly and most importantly, what is the use for such
transposition, what would be its high-level conceptual meaning ?
To help figure this out, here is a little example. Suppose one has a table with
the following definition: SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,
NbShares), i.e. SecuritiesByPortfolios is a relation containing a summary of client's
portfolios counting the number of securities by type, either bonds, warrants or
shares. The table gives the composition of each portfolio, spread by security
type. The NbBonds column contains the number of bonds one given client has in his
portfolio, and so forth for the NbWarrants and NbShares columns. A symbolic
representation of such a table could look like:

Type of Security    NbBonds NbWarrants NbShares
Portfolio id
         p1          b1    w1             s1
         p2       b2    w2             s2
         ...          ... ...          ...
         pn          bn    wn             sn

It is worth noting that in the most general case there can be any number of
portfolios, whereas the type of securities is limited to three here. Such a
table generally has an unconstrained number of rows but a finite number of
columns.
Now, let's consider a transposition of such a table, noted as follows:
tr_SecuritiesByPortfolios(TypeOfSecurity, p1, p2, ... pn):

Portfolios    P1 p2 ... pn
Type of Security
NbBonds          b1 b2 ... bn
NbWarrants w1 w2 ... wn
NbShares s1 s2 ... sn

One can quickly notice that the above table is the same as before but rotated
900 counter-clock side (to be exact, a symmetry along a horizontal axis is also
successively performed on the rotated table). The rows have been turned into
columns, and the columns into rows. It contains exactly the same data as before
but what used to be lines are now columns and the columns have become lines.
As it is quite intuitive here, this new table is another way of looking at the
same data. Here, one wants to know for each type of security how many of them
are held in each existing portfolio. The very same data as before are presented
but from a different point of view which is no less meaningful than the original
one. Emphasis now is put on the type of securities whilst before it was put on
the portfolios. As a consequence of transposition, this table generally has now
an unconstrained number of columns but a finite number of lines.
Although this fact is not visible in the above figure, it is important to note
that after transposition the new columns are generally of mixed type and
therefore such a table would not be strictly relational any more. However, by
using type conversion to strings, it still can be manipulated relationally.
The object of this article is to propose several ways to turn the first table
into the second one, i.e. to transpose data. Here tables are to be understood
in their most general meaning: they can be the result of an SQL query spooled
into a text file or databases' relational tables or views. Since it is
straightforward to go from the latter to the former (SELECT... FROM...) and
only less trivial to go the other way around (must use the SQL*Loader with a
suitable control file), the distinction is not relevant in the present context.
So hereafter, the term table will be used interchangeably to designate either a
relational database table or view, or the spooled data extracted by a query.
Its context should be sufficient to distinguish between each other in case of
need.
Roughly, there are two ways to accomplish a transposition:
1. an internal to the database one which uses SQL language statements to
      reorganize the data and directly works upon the database's tables;
2. an external one which processes the data themselves once they've been
      extracted by some SQL statement and spooled into a file, typically a
      fixed-length flat text file;

The internal method
-------------------
This method can be implemented in two ways; one can either use declarative SQL
statements or imperative, procedural PL/SQL statements. Let's see how they
could look like.

The SQL approach
----------------
Since each selected row potentially contains columns of different types from
other lines, one query per row and a union operator in between are necessary.
Each intermediary query must select one value for each column, but since the
values are each in a different record they must be grouped together - somewhat
artificially - on the same line, and this requires a Cartesian product.
At first sight, the internal method is very appealing because it generates the
transposed table directly from the database server. Thus, the data don't need
to be first extracted and stored on some storage device in order to be processed
later, which is convenient for very large result sets. However, the SQL SELECT
statement's syntax demands that a finite list of column names be given, which is
generally not possible because the number of rows is not known in advance. One
has just to remember that the number of columns in the transposed table is equal
to the number of rows in the original table. Consequently, a first run - but a
simplified one - of the original table (i.e. the one before the transposition)
must be taken in order to have an exhaustive list of columns to feed to the
transposition query's SELECT statement.
In the general case it is impractical to manually type the transposition query
because the number of columns can be arbitrary large, so it must be programmatically
generated. After the query generation, the first run's extracted data is not used
anymore and can thus be deleted.
Let's now see what would the transposition query look like, first with the
preceding example and later with a more general case.
The table to transpose is SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,
NbShares). One wants it transposed into tr_SecuritiesByPortfolios(TypeOfSecurity,
p1, p2, ... pn), where pi is the number of securities of a given type for portfolio i.
To be even more generic, a view for the tansposed data will be created. This view
could be materialized later into a real table or simply SELECTed against for
displaying the data.
Here is one possible solution:

-- note: italicized IDi are portfolio ids quoted literals
-- and bold IDi are either quoted (if their datatype is VARCHAR) or unquoted
-- portfolio ids literals;
CREATE VIEW
tr_PortfolioSummary
AS SELECT
T1.NbBonds ID1, T2.NbBonds ID2, ..., Tn.NbBonds IDn
FROM
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   ID = ID1
) T1,
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   ID = ID2
) T2,
...
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   ID = IDn
) Tn
UNION
SELECT
   T1.NbWarrants, T2.NbWarrants, ..., Tn.NbWarrants
FROM
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      ID = ID1
   ) T1,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      ID = ID2
   ) T2,
   ...
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      ID = IDn
   ) Tn
UNION
SELECT
   T1.NbShares, T2.NbShares, ..., Tn.NbShares
FROM
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      ID = ID1
   ) T1,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      ID = ID2
   ) T2,
   ...
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      ID = IDn
   ) Tn
To show concretely what it looks like, let's apply the preceding method to the
following data:

-- sample table creation;
drop table
PortfolioSummary
/
create table
PortfolioSummary
(
PortfolioID varchar2(10) PRIMARY KEY,
NbBonds number(5),
NbWarrants number(5),
NbShares number(5)
)
/

insert into PortfolioSummary values ('P1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 40, 50, 60);
insert into PortfolioSummary values ('P3',  40, 50, 60);
select * from PortfolioSummary;

-- create the transposed view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
'Nb Bonds' Info, 1 SortOrder, T1.NbBonds "P1", T2.NbBonds "P2",
T3.NbBonds "P3"
FROM
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P1'
) T1,
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P2'
) T2,
(SELECT
   NbBonds
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P3'
) T3
UNION
SELECT
   'Nb Warrants' Info, 2 SortOrder, T1.NbWarrants, T2.NbWarrants,
   T3.NbWarrants
FROM
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3
UNION
SELECT
   'Nb Shares' Info, 3 SortOrder, T1.NbShares, T2.NbShares, T3.NbShares
FROM
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3
/

-- display the resulting transposed view;
column "info" format A11
column "P1" format 99999
column "P2" format 99999
column "P3" format 99999

-- display the view;
SELECT
Info " ", P1, P2, P3
FROM
tr_PortfolioSummary
ORDER BY
SortOrder
/
-- original data;
PORTFOLIOI NBBONDS NBWARRANTS NBSHARES
---------- ---------- ---------- ----------
P1                10       20       30
P2                40       50       60
P3                40       50       60

-- transposed data;
            P1    P2    P3
----------- ------ ------ ------
Nb Bonds       10    40    40
Nb Warrants    20    50    50
Nb Shares    30    60    60

Clearly, the query is quite complex and can take some time to execute on large
tables. It contains UNIONs of as many queries as columns in SecuritiesByPortfolios,
and each such query has as many in-line views as rows in that table. Moreover,
the query is extensively based on Cartesian products of single-row in-line
views. To guarantee that those views contain one and only one row, all the
predicates must refer a table's primary key, which is ID here.
The UNION blocks can be reordered if needed, but this can lead to some confusion
in interpreting the transposed data, so it is better to arrange the first row in
the transposed table to be a line of unique ids.
In addition, UNION does not preserve the row order so a sort must be done later,
while selecting for the view. For this reason, a collating column to be sorted
on has been added, through which it is possible to impose the row ordering.
This only slightly complicates the SQL statement.
It is worth reminding that the Oracle server imposes a limit on the number of
columns in a SELECT statement: 1000 for Oracle v8x. Moreover, the maximum size
of an SQL statement is also limited: 64 Kb though this limit is subject to
several factors and is therefore not absolute. These limits constraint the
complexity of the queries that can be transposed, and hence make this approach
unsuitable for large data sets (more than one thousand rows). Still, it is
interesting because the space cost is affordable.
For not so large tables to transpose, the number of columns limit can be reached
easily. Ditto for the 64 Kb SQL text limit. Additionally, manually writing such
a query can be tedious and error-prone, so it is almost mandatory - except in
trivially simple cases - to use a query generator, either an ad-hoc one tailored
for a particular table or a generic one which can process any table. For this example,
such a generator could take as input the spooled data extracted from
SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants, NbShares) and output
an SQL query into a text file ready for execution, as shown by the following
invocation (only on UNIX):

awk -f gen_transpose.awk SecuritiesByPortfolios.txt > tr_SecuritiesByPortfolios.sql

Here is an example of such a generator script, a generic one, later.

A more representative example
-----------------------------
Let's consider an example with columns of heterogeneous types. Let PortfolioSummary
have the following structure:
PortfolioSummary(PortfolioID, ManagerID, ClientID, NbBonds, NbWarrants, NbShares).
The ID columns are alphanumeric, the last 3 ones are purely numeric. One could
have thrown in one date column as well, but this is sufficient for this
demonstration purpose.
One wants the transpose of PortfolioSummary, noted tr_PortfolioSummary, with the
following structure:
tr_PortfolioSummary(PortfolioID1, PortfolioID2, ... PortfolioIDn)
whose rows are:
(ManagerID1,  ManagerID2, ManagerID3, ... ManagerIDn)
(ClientID11,  ClientID2, ClientID3, ..., Clientn)
(NbBonds1, NbBonds2, NbBonds3, ..., NbBondsn)
(NbWarrants1, NbWarrants2, NbWarrants3, ..., NbWarrantsn)
(NbShares1, NbShares2, NbShares3, ..., NbSharesn)
where n is the number of unique portfolios in PortfolioSummary.
Given the simple case example and the preceding considerations, here is a possible
SQL statement:

-- create the master table;
create table
PortfolioSummary
(
PortfolioID varchar2(10) PRIMARY KEY,
ManagerID varchar2(10),
ClientID varchar2(10),
NbBonds number(5),
NbWarrants number(5),
NbShares number(5)
);

-- populate the table;
insert into PortfolioSummary values ('P1', 'm1', 'c1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 'm1', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P3', 'm2', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P4', 'm2', 'c2', 80, 90, 100);
insert into PortfolioSummary values ('P5', 'm2', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P6', 'm3', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P7', 'm3', 'c4', 140, 150, 160);
insert into PortfolioSummary values ('P8', 'm4', 'c5', 170, 180, 190);
insert into PortfolioSummary values ('P9', 'm5', 'c5', 200, 210, 220);

-- display the original table;
select * from PortfolioSummary;

-- create the transposition view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
'ManagerID' Info, 1 SortOrder,
T1.ManagerID P1, T2.ManagerID P2, T3.ManagerID P3, T4.ManagerID P4,
T5.ManagerID P5, T6.ManagerID P6, T7.ManagerID P7, T8.ManagerID P8,
T9.ManagerID P9
FROM
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P1'
) T1,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P2'
) T2,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P3'
) T3,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P4'
) T4,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P5'
) T5,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P6'
) T6,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P7'
) T7,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P8'
) T8,
(SELECT
   ManagerID
FROM
   PortfolioSummary
WHERE
   PortfolioID = 'P9'
) T9
UNION
SELECT
   'ClientId' Info, 2 SortOrder,
   T1.ClientID, T2.ClientID, T3.ClientID, T4.ClientID, T5.ClientID,
   T6.ClientID, T7.ClientID, T8.ClientID, T9.ClientID
FROM
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P4'
   ) T4,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P5'
   ) T5,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P6'
   ) T6,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P7'
   ) T7,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P8'
   ) T8,
   (SELECT
      ClientID
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P9'
   ) T9
UNION
SELECT
   'Nb Bonds' Info, 3 SortOrder,
   to_char(T1.NbBonds), to_char(T2.NbBonds), to_char(T3.NbBonds),
   to_char(T4.NbBonds), to_char(T5.NbBonds), to_char(T6.NbBonds),
   to_char(T7.NbBonds), to_char(T8.NbBonds), to_char(T9.NbBonds)
FROM
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P4'
   ) T4,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P5'
   ) T5,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P6'
   ) T6,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P7'
   ) T7,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P8'
   ) T8,
   (SELECT
      NbBonds
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P9'
   ) T9
UNION
SELECT
   'NbWarrants' Info, 4 SortOrder,
   to_char(T1.NbWarrants), to_char(T2.NbWarrants),
   to_char(T3.NbWarrants), to_char(T4.NbWarrants),
   to_char(T5.NbWarrants), to_char(T6.NbWarrants),
   to_char(T7.NbWarrants), to_char(T8.NbWarrants),
   to_char(T9.NbWarrants)
FROM
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P4'
   ) T4,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P5'
   ) T5,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P6'
   ) T6,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P7'
   ) T7,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P8'
   ) T8,
   (SELECT
      NbWarrants
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P9'
   ) T9
UNION
SELECT
   'NbShares' Info, 5 SortOrder,
   to_char(T1.NbShares), to_char(T2.NbShares), to_char(T3.NbShares),
   to_char(T4.NbShares),
   to_char(T5.NbShares), to_char(T6.NbShares), to_char(T7.NbShares),
   to_char(T8.NbShares), to_char(T9.NbShares)
FROM
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P1'
   ) T1,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P2'
   ) T2,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P3'
   ) T3,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P4'
   ) T4,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P5'
   ) T5,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P6'
   ) T6,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P7'
   ) T7,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P8'
   ) T8,
   (SELECT
      NbShares
   FROM
      PortfolioSummary
   WHERE
      PortfolioID = 'P9'
   ) T9
/

-- display the resulting transposed view;
column "info" format A11
column "P1" format A5
column "P2" format A5
column "P3" format A5
column "P4" format A5
column "P5" format A5
column "P6" format A5
column "P7" format A5
column "P8" format A5
column "P9" format A5
SELECT
Info " ", "P1", P2, P3, P4, P5, P6, P7, P8, P9
FROM
tr_PortfolioSummary
ORDER BY
SortOrder
/

Here is the result of the above script:

PORTFOLIOI MANAGERID  CLIENTID    NBBONDS NBWARRANTS  NBSHARES
---------- ---------- ---------- --------- ---------- ---------
P1       m1       c1             10       20       30
P2       m1       c1             40       50       60
P3       m2       c1             40       50       60
P4       m2       c2             80       90    100
P5       m2       c3             110       120    130
P6       m3       c3             110       120    130
P7       m3       c4             140       150    160
P8       m4       c5             170       180    190
P9       m5       c5             200       210    220

9 rows selected.

View created.

         P1 P2 P3 P4 P5 P6 P7 P8 P9
---------- ----- ----- ----- ----- ----- ----- ----- ----- -----
ManagerID  m1 m1 m2 m2 m2 m3 m3 m4 m5
ClientId c1 c1 c1 c2 c3 c3 c4 c5 c5
Nb Bonds 10 40 40 80 110 110 140 170 200
NbWarrants 20 50 50 90 120 120 150 180 210
NbShares 30 60 60 100 130 130 160 190 220

Note the use of type conversion of columns to VARCHAR2 to make sure they are
type-compatible across rows so the UNION operator can work.
The above query is about 350 lines long, for an original 9-row and 6-column
table ! If one uses the same query layout (but with the SELECT's list of columns
on one line only), transposing a general n lines by m columns table into a view
will need a (7 * n + 4) * (m - 1) line query; applied to a 100 x 10 table, the
transposition query will be more than 6'300 lines long ! Moreover, if the client
software used to submit such a query to the server is SQL*PLUS, this tool limits
the command line length to 2'500 characters; another tool would maybe have
different limits; this was also a factor that had to be considered when generating
the query. This illustrates the complexity of SQL-made table transposition, and
the necessity to resort to a query generator that also takes care of the client's
limits.

A transposing query generator
-----------------------------
As it was showed in the preceding simple case, it is highly advisable to use an
automatic tool to generate such SQL queries. Fortunately, due to the highly
regular structure of the transposing queries, it is fairly easy to write such a
tool.
Here is a general awk script for building a transposition query. For the scripting
languages freaks out there, this script does not use any awk idiom (are there
any except the automatic line input loop ?), so it could be easily converted
into your favorite scripting language, mostly perl, tcl or Python.
It takes as input a spooled file containing the result of a SELECT primary_key_column
FROM view_to_transpose (let's name it spooledPKFile) and a spooled DESC
view_to_transpose file (let's name it spooledDescFile), and insures that the
generated script's lines stay under the UserLineLength limit. If a QueryViewFile
is specified then an additional SQL*PLUS script file is produced for querying
the transposed view. The usage is:

awk -f gen_transpose.awk
-v PrimaryKeyFile=spooledPKFile
-v ViewDescFile=spooledDescFile
-v UserLineLength=some_number
-v QueryViewFile=some_file    > transpose.sql

The ordering of the rows in the transposed view is that of the column description
in spooledDescFile. If a different order is wished, this file should be edited,
the only restriction is that the first line listed should be the primary key's
description.
The proposed implementation is basic yet effective, and several improvements
could be easily added such as formatting options, row naming, collating order,
etc. Currently, the generated SQL script must be manually edited if the format
is not suitable.

# Name: gen_transpose.awk;
# Date of creation: January 2000;
# Purpose: awk script to generate a transposing SQL statement;
# usage:
#       awk -f gen_transpose.awk -v PrimaryKeyFile=....
#                               -v ViewDescFile=....
#                               -v UserLineLength=0|[0-9]+
#                               -v QueryViewFile=...
# where PrimaryKeyFile is the file name containing a spool of the primary keys in the view to
# transpose,
# and ViewDescFile is the file name of a text file containing the result of a DESC of the view to
# transpose;
# PrimaryKeyFile must begin with a line containing the column name, followed by a line of ----,
# followed
# by the primary key values each on its own line, and terminated by a blank line;
# it can be produced by the following Sql*Plus instructions:
#  spool PrimaryKeyFile
#  select primary_key from view_to_transpose;
#  spool off
# ViewDescFile must begin with 'desc view-to_transpose' on first line, followed by the
# 'Name Null? Type' line, followed by a line of '----', then each column name on a line by itself
# and a blank line;
# it can be produced by the following Sql*Plus instructions:
#  spool ViewDesc
#  DESC view_to_transpose
#  spool off
# first line is assumed to be the primary key description;
# the order of the lines in the DESC file will be used to output the transposed columns;
# if a different order is wanted, the DESC file must be edited;
# emitted SQL code can be executed right away if redirected into a file/piped into Sql*Plus;
# if UserLineLength is non null then the large column list in the SELECT stmt will be wrapped so as
# not to overflow the line width;
# if QueryViewFile is given, a SQL*PLUS file is produced to query the transposed view;
BEGIN {
# program constants, edit them to suit your needs;
TransposeViewPrefix = "tr_"
# in generated query against the produced view;
ColumnSepBlank = 1

# name of this current script;
AwkProgram = "gen_transpose.awk"
Error = 0

# read in the view description from ViewDescFile;
if (1 != (getline < ViewDescFile)) {
   Error = 1;
   print "Error while reading file " ViewDescFile
   exit
}
MasterView = $3
TransposedViewName = TransposeViewPrefix  MasterView

# skip irrelevant lines;
while (1 == (getline < ViewDescFile) && !match($0, /^ (-+ *)+$/));

# find columns' name, type and max. width;
NbColumns = 0
MustCastToChar = 0
MaxColWidth = 0
PrimaryKeyWidth = 0
while (1 == (getline < ViewDescFile) && "" != $0) {
   # allow for spaces in column names (they were defined between double quotes);
   Temp = substr($0, 2, 31)
   gsub(/ +$/, "", Temp)
   ColumnName[NbColumns] = Temp;
   if (length($0) > PrimaryKeyWidth)
      PrimaryKeyWidth = length(Temp)

   IsNumeric = (match($0, /NUMBER($[0-9]+(,[0-9]+)?$)? *$/) > 0) ? 1 : 0
   ColumnIsNumeric[NbColumns] = IsNumeric
   MustCastToChar += IsNumeric    # |= would be enough but unlike C, awk does not have it;

   ColWidth = substr($0, match($0, /$[0-9]+($|,)/) + 1, RLENGTH - 1) + 0
   if (ColWidth > MaxColWidth)
      MaxColWidth = ColWidth

   NbColumns++
}
close(ViewDescFile)

# read in the primary keys from PrimaryKeyFile;
NbPrimaryKeys = 0
while (1 == (getline < PrimaryKeyFile) && !match($0, /^-+ *$/)); # skip irrelevant lines;
while (1 == (getline < PrimaryKeyFile) && "" != $0)
   PrimaryKeyTab[NbPrimaryKeys++] = $1
close(PrimaryKeyFile)
# any I/O error ?
if (!NbPrimaryKeys) {
   Error = 1
   print "Error while reading file " PrimaryKeyFile
   print "at line " NbPrimaryKeys
   exit
}

# starts emitting SQL;
print "-- SCRIPT GENERATED BY " AwkProgram ";"
print "-- do not edit;"
printf "\n"
print "-- create the transposition view;"
print "DROP VIEW " TransposedViewName ";"
print "CREATE VIEW " TransposedViewName " AS"

# following 2 variables control the line wrapping if user has set UserLineLength different than 0;
CurrentLineLength = 0
Buffer = ""

# builds UNIONed blocks, one per column; those are the transposed view's lines;
for (ColumnIndex = 1; ColumnIndex < NbColumns; ColumnIndex++) {
   print "SELECT"
   WrappedPrintf(" '" ColumnName[ColumnIndex] "' Info, " ColumnIndex " SortOrder")
   # outputs the columns values, one per PK value;
   for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
      WrappedPrintf(sprintf(", %sT%d.%s%s%s",
                           ColumnIsNumeric[ColumnIndex] && MustCastToChar ? "to_char(" : "",
                           KeyIndex + 1,
                           ColumnName[ColumnIndex],
                           ColumnIsNumeric[ColumnIndex] && MustCastToChar ? ")" : "",
                           1 == ColumnIndex ? " \"" PrimaryKeyTab[KeyIndex] "\"" : ""))
   WrappedPrintf("\n")
   print "FROM"
   # in-line views here, one per PK value; they'll be used in the Cartesian product;
   for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++) {
      print " (SELECT"
      print "    " ColumnName[ColumnIndex]
      print " FROM"
      print "    " MasterView
      print " WHERE"
      print "    " ColumnName[0] " = " (ColumnIsNumeric[0] ? "" : "'") PrimaryKeyTab[KeyIndex]
            (ColumnIsNumeric[0] ? "" : "'")
      printf(" ) T%d%s\n", KeyIndex + 1, KeyIndex < NbPrimaryKeys - 1 ? "," : "")
   }
   print (ColumnIndex < NbColumns - 1) ? "UNION" : "/"
}

# create the query against the new view statement to display it;
print "-- display the resulting transposed view;"
print "-- SCRIPT GENERATED BY " AwkProgram ";"    > QueryViewFile
print "-- do not edit;"                            > QueryViewFile
# avoid multiple page header lines in Sql*Plus;
print "set pagesize 10000"                         > QueryViewFile
# force column separator's length;
printf("set colsep \"%*s\"\n", ColumnSepBlank, " ") > QueryViewFile
# attempt to avoid linewrapping in Sql*Plus;
# maximum line length is 32767;
printf("set linesize %d\n",
      ((Temp = (MaxColWidth + ColumnSepBlank) * NbPrimaryKeys + PrimaryKeyWidth) <= 32767) ?
      Temp : 32767)                               > QueryViewFile
print "column info format A" PrimaryKeyWidth       > QueryViewFile
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
   print "column " PrimaryKeyTab[KeyIndex] " format A" MaxColWidth > QueryViewFile
print "SELECT"                                     > QueryViewFile
WrappedPrintf(" Info \" \"", 1)
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
   WrappedPrintf(", \"" PrimaryKeyTab[KeyIndex] "\"", 1)
WrappedPrintf("\n", 1)
print "FROM"                                     > QueryViewFile
print " " TransposedViewName                   > QueryViewFile
print "ORDER BY"                                  > QueryViewFile
print " SortOrder"                               > QueryViewFile
print "/"                                        > QueryViewFile
}

# controls that a line is not overflown when UserLineLength is different from 0;
# modifies globals Buffer and CurrentLineLength;
# those are side-effects but the program is short and this is the only function in it, so ...;
function WrappedPrintf(StringToPrint, OutputFile) {
   if (1 == ToQueryViewFile)
      OutputFile = QueryViewFile
   else
      OutputFile = "/dev/stdout"
   if (0 == UserLineLength)                               # no restrictions, print right away;
      printf StringToPrint > OutputFile
   else if ("\n" != StringToPrint) {                      # flush buffer ?
      Temp = length(StringToPrint)
      if (CurrentLineLength + Temp > UserLineLength) {    # potential overflow ?
         print Buffer > OutputFile                      # yes, flush it before writing into it;
         Buffer = StringToPrint
         CurrentLineLength = Temp
      }
      else {                                              # no overflow, write into the buffer;
         Buffer = Buffer StringToPrint
         CurrentLineLength += Temp
      }
   }
   else {                                                 # flush buffer;
      print Buffer > OutputFile
      Buffer = ""
      CurrentLineLength = 0
   }
}

As it is visible, the script is surprisingly short, with the bulk of the work being performed by two small nested loops. This is because of the orthogonal structure of the transpose SQL query, as shown in the preceding SQL examples.
Converting to a uniform VARCHAR2 type does work for the basic column datatypes (such as NUMBER and DATE) but is likely to fail with
more exotic ones such as LONGs and LOBs. In these cases, the generator could be modified to only extract a short excerpt from the
character LOB types, and skip the BLOBs. Or such columns could be ignored altogether, or an error/warning message issued. Here,
those datatypes are not processed especially so the transposing statement will probably fail. The DATE datatype will be implicitly converted using whatever NLS_DATE_FORMAT is defined in the client's environment.

阅读(2236) | 评论(0) | 转发(0) |

上一篇：SQL行列转换实战（一）

下一篇：SQL行列转换实战（三）

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6