Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1411717
  • 博文数量: 416
  • 博客积分: 13005
  • 博客等级: 上将
  • 技术积分: 3297
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-05 16:26
文章分类

全部博文(416)

文章存档

2014年(1)

2013年(4)

2012年(46)

2011年(64)

2010年(12)

2009年(4)

2008年(40)

2007年(187)

2006年(58)

分类: 数据库开发技术

2007-03-27 09:42:47

这里有篇行列转换的文章,大家共享一下
[b]
Line-Column Transposition
-------------------------
Introduction
------------
This question was asked once in an Oracle forum on Usenet, and received a very  
evasive answer. Previously, I had to solve a similar issue while working for  
another company; while the solution worked, it didn't look very convincing, and  
it was hard to maintain. Later, I was asked this same question at a job interview.  
Finally, we received such an inquiry from a customer at Support too. The proposed  
solution was a PL/SQL function very specific to that customer's tables. Recently,  
another customer admitted she had to face such an issue too. All this to say that  
anybody is likely to encounter this problem at least once in a SQL programmer's  
life. That is why I decided to tackle this issue once and for all and to provide  
a portable and generic solution for it. Actually, I propose 3 solutions amidst  
many other alternatives. Busy people please jump directly to the paragraph Usage  
at the end of this article.

When a relational database is queried, the returned data is naturally presented  
as a suite of line feed-terminated rows composed of several attribute values  
aligned in columns. This tabular form closely matches the underlying relational  
model and is therefore easily understood. This kind of representation also looks  
like a matrix because it is a two-dimensional, rectangular array of data too.  
This naturally leads to think of certain relational operations as matrix ones.  
For example, a relational projection is similar to a matrix post-product and a  
relational data selection to a matrix transposition followed by a matrix post-product  
followed by a matrix transposition or, shorter, a matrix pre-product.
But, conversely, what relational operation does correspond to a matrix transposition  
in which lines are converted to rows and rows to lines ?
There is no native relational operator that does this, so this functionality must  
be programmed ad hoc. But firstly and most importantly, what is the use for such  
transposition, what would be its high-level conceptual meaning ?
To help figure this out, here is a little example. Suppose one has a table with  
the following definition: SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,  
NbShares), i.e. SecuritiesByPortfolios is a relation containing a summary of client's  
portfolios counting the number of securities by type, either bonds, warrants or  
shares. The table gives the composition of each portfolio, spread by security  
type. The NbBonds column contains the number of bonds one given client has in his  
portfolio, and so forth for the NbWarrants and NbShares columns. A symbolic  
representation of such a table could look like:

Type of Security           NbBonds        NbWarrants        NbShares         
Portfolio id
           p1           b1      w1              s1         
           p2              b2      w2              s2         
           ...                ...        ...                ...         
           pn           bn      wn              sn         

It is worth noting that in the most general case there can be any number of  
portfolios, whereas the type of securities is limited to three here. Such a  
table generally has an unconstrained number of rows but a finite number of  
columns.
Now, let's consider a transposition of such a table, noted as follows:
tr_SecuritiesByPortfolios(TypeOfSecurity, p1, p2, ... pn):

Portfolios           P1        p2        ...        pn         
Type of Security
NbBonds                b1        b2        ...        bn         
NbWarrants        w1        w2        ...        wn         
NbShares        s1        s2        ...        sn         

One can quickly notice that the above table is the same as before but rotated  
900 counter-clock side (to be exact, a symmetry along a horizontal axis is also  
successively performed on the rotated table). The rows have been turned into  
columns, and the columns into rows. It contains exactly the same data as before  
but what used to be lines are now columns and the columns have become lines.  
As it is quite intuitive here, this new table is another way of looking at the  
same data. Here, one wants to know for each type of security how many of them  
are held in each existing portfolio. The very same data as before are presented  
but from a different point of view which is no less meaningful than the original  
one. Emphasis now is put on the type of securities whilst before it was put on  
the portfolios. As a consequence of transposition, this table generally has now  
an unconstrained number of columns but a finite number of lines.
Although this fact is not visible in the above figure, it is important to note  
that after transposition the new columns are generally of mixed type and  
therefore such a table would not be strictly relational any more. However, by  
using type conversion to strings, it still can be manipulated relationally.
The object of this article is to propose several ways to turn the first table  
into the second one, i.e. to transpose data. Here tables are to be understood  
in their most general meaning: they can be the result of an SQL query spooled  
into a text file or databases' relational tables or views. Since it is  
straightforward to go from the latter to the former (SELECT... FROM...) and  
only less trivial to go the other way around (must use the SQL*Loader with a  
suitable control file), the distinction is not relevant in the present context.  
So hereafter, the term table will be used interchangeably to designate either a  
relational database table or view, or the spooled data extracted by a query.  
Its context should be sufficient to distinguish between each other in case of  
need.
Roughly, there are two ways to accomplish a transposition:
1.        an internal to the database one which uses SQL language statements to  
        reorganize the data and directly works upon the database's tables;
2.        an external one which processes the data themselves once they've been  
        extracted by some SQL statement and spooled into a file, typically a  
        fixed-length flat text file;

The internal method
-------------------
This method can be implemented in two ways; one can either use declarative SQL  
statements or imperative, procedural PL/SQL statements. Let's see how they  
could look like.

The SQL approach
----------------
Since each selected row potentially contains columns of different types from  
other lines, one query per row and a union operator in between are necessary.  
Each intermediary query must select one value for each column, but since the  
values are each in a different record they must be grouped together - somewhat  
artificially - on the same line, and this requires a Cartesian product.
At first sight, the internal method is very appealing because it generates the  
transposed table directly from the database server. Thus, the data don't need  
to be first extracted and stored on some storage device in order to be processed  
later, which is convenient for very large result sets. However, the SQL SELECT  
statement's syntax demands that a finite list of column names be given, which is  
generally not possible because the number of rows is not known in advance. One  
has just to remember that the number of columns in the transposed table is equal  
to the number of rows in the original table. Consequently, a first run - but a  
simplified one - of the original table (i.e. the one before the transposition)  
must be taken in order to have an exhaustive list of columns to feed to the  
transposition query's SELECT statement.
In the general case it is impractical to manually type the transposition query  
because the number of columns can be arbitrary large, so it must be programmatically  
generated. After the query generation, the first run's extracted data is not used  
anymore and can thus be deleted.
Let's now see what would the transposition query look like, first with the  
preceding example and later with a more general case.
The table to transpose is SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,  
NbShares). One wants it transposed into tr_SecuritiesByPortfolios(TypeOfSecurity,  
p1, p2, ... pn), where pi is the number of securities of a given type for portfolio i.
To be even more generic, a view for the tansposed data will be created. This view  
could be materialized later into a real table or simply SELECTed against for  
displaying the data.
Here is one possible solution:

-- note: italicized IDi are portfolio ids quoted literals
-- and bold IDi are either quoted (if their datatype is VARCHAR) or unquoted
-- portfolio ids literals;
CREATE VIEW
   tr_PortfolioSummary
AS SELECT
   T1.NbBonds ID1, T2.NbBonds ID2, ..., Tn.NbBonds IDn
FROM
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       ID = ID1
   ) T1,
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       ID = ID2
   ) T2,
   ...
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       ID = IDn
   ) Tn
UNION
   SELECT
      T1.NbWarrants, T2.NbWarrants, ..., Tn.NbWarrants
   FROM
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          ID = ID1
      ) T1,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          ID = ID2
      ) T2,
      ...
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          ID = IDn
      ) Tn
UNION
   SELECT
      T1.NbShares, T2.NbShares, ..., Tn.NbShares
   FROM
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          ID = ID1
      ) T1,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          ID = ID2
      ) T2,
      ...
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          ID = IDn
      ) Tn
To show concretely what it looks like, let's apply the preceding method to the  
following data:

-- sample table creation;
drop table
PortfolioSummary
/
create table
   PortfolioSummary
(
    PortfolioID varchar2(10) PRIMARY KEY,
    NbBonds number(5),
    NbWarrants number(5),
    NbShares number(5)
)
/

insert into PortfolioSummary values ('P1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 40, 50, 60);
insert into PortfolioSummary values ('P3',  40, 50, 60);
select * from PortfolioSummary;

-- create the transposed view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
   'Nb Bonds' Info, 1 SortOrder, T1.NbBonds "P1", T2.NbBonds "P2",  
    T3.NbBonds "P3"
FROM
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P1'
   ) T1,
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P2'
   ) T2,
   (SELECT
       NbBonds
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P3'
   ) T3
UNION
   SELECT
      'Nb Warrants' Info, 2 SortOrder, T1.NbWarrants, T2.NbWarrants,  
       T3.NbWarrants
   FROM
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3
UNION
   SELECT
      'Nb Shares' Info, 3 SortOrder, T1.NbShares, T2.NbShares, T3.NbShares
   FROM
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3
/

-- display the resulting transposed view;
column "info" format A11
column "P1" format 99999
column "P2" format 99999
column "P3" format 99999

-- display the view;
SELECT
   Info " ", P1, P2, P3
FROM
   tr_PortfolioSummary
ORDER BY
   SortOrder
/
-- original data;
PORTFOLIOI    NBBONDS NBWARRANTS   NBSHARES
---------- ---------- ---------- ----------
P1                 10         20         30
P2                 40         50         60
P3                 40         50         60

-- transposed data;
                P1     P2     P3
----------- ------ ------ ------
Nb Bonds        10     40     40
Nb Warrants     20     50     50
Nb Shares       30     60     60

Clearly, the query is quite complex and can take some time to execute on large  
tables. It contains UNIONs of as many queries as columns in SecuritiesByPortfolios,  
and each such query has as many in-line views as rows in that table. Moreover,  
the query is extensively based on Cartesian products of single-row in-line  
views. To guarantee that those views contain one and only one row, all the  
predicates must refer a table's primary key, which is ID here.
The UNION blocks can be reordered if needed, but this can lead to some confusion  
in interpreting the transposed data, so it is better to arrange the first row in  
the transposed table to be a line of unique ids.
In addition, UNION does not preserve the row order so a sort must be done later,  
while selecting for the view. For this reason, a collating column to be sorted  
on has been added, through which it is possible to impose the row ordering.  
This only slightly complicates the SQL statement.
It is worth reminding that the Oracle server imposes a limit on the number of  
columns in a SELECT statement: 1000 for Oracle v8x. Moreover, the maximum size  
of an SQL statement is also limited: 64 Kb though this limit is subject to  
several factors and is therefore not absolute. These limits constraint the  
complexity of the queries that can be transposed, and hence make this approach  
unsuitable for large data sets (more than one thousand rows). Still, it is  
interesting because the space cost is affordable.
For not so large tables to transpose, the number of columns limit can be reached  
easily. Ditto for the 64 Kb SQL text limit. Additionally, manually writing such  
a query can be tedious and error-prone, so it is almost mandatory - except in  
trivially simple cases - to use a query generator, either an ad-hoc one tailored  
for a particular table or a generic one which can process any table. For this example,  
such a generator could take as input the spooled data extracted from  
SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants, NbShares) and output  
an SQL query into a text file ready for execution, as shown by the following  
invocation (only on UNIX):

awk -f gen_transpose.awk SecuritiesByPortfolios.txt > tr_SecuritiesByPortfolios.sql

Here is an example of such a generator script, a generic one, later.

A more representative example
-----------------------------
Let's consider an example with columns of heterogeneous types. Let PortfolioSummary  
have the following structure:
PortfolioSummary(PortfolioID, ManagerID, ClientID, NbBonds, NbWarrants, NbShares).
The ID columns are alphanumeric, the last 3 ones are purely numeric. One could  
have thrown in one date column as well, but this is sufficient for this  
demonstration purpose.
One wants the transpose of PortfolioSummary, noted tr_PortfolioSummary, with the  
following structure:
tr_PortfolioSummary(PortfolioID1, PortfolioID2, ... PortfolioIDn)
whose rows are:
(ManagerID1,  ManagerID2,   ManagerID3,   ... ManagerIDn)
(ClientID11,  ClientID2,   ClientID3,   ..., Clientn)
(NbBonds1,    NbBonds2,    NbBonds3,    ..., NbBondsn)
(NbWarrants1, NbWarrants2, NbWarrants3, ..., NbWarrantsn)
(NbShares1,   NbShares2,   NbShares3,   ..., NbSharesn)
where n is the number of unique portfolios in PortfolioSummary.
Given the simple case example and the preceding considerations, here is a possible  
SQL statement:

-- create the master table;
create table
   PortfolioSummary
(
    PortfolioID varchar2(10) PRIMARY KEY,
    ManagerID varchar2(10),
    ClientID varchar2(10),
    NbBonds number(5),
    NbWarrants number(5),
    NbShares number(5)
);

-- populate the table;
insert into PortfolioSummary values ('P1', 'm1', 'c1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 'm1', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P3', 'm2', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P4', 'm2', 'c2', 80, 90, 100);
insert into PortfolioSummary values ('P5', 'm2', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P6', 'm3', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P7', 'm3', 'c4', 140, 150, 160);
insert into PortfolioSummary values ('P8', 'm4', 'c5', 170, 180, 190);
insert into PortfolioSummary values ('P9', 'm5', 'c5', 200, 210, 220);

-- display the original table;
select * from PortfolioSummary;

-- create the transposition view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
   'ManagerID' Info, 1 SortOrder,
   T1.ManagerID P1, T2.ManagerID P2, T3.ManagerID P3, T4.ManagerID P4,
   T5.ManagerID P5, T6.ManagerID P6, T7.ManagerID P7, T8.ManagerID P8,
   T9.ManagerID P9
FROM
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P1'
   ) T1,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P2'
   ) T2,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P3'
   ) T3,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P4'
   ) T4,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P5'
   ) T5,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P6'
   ) T6,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P7'
   ) T7,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P8'
   ) T8,
   (SELECT
       ManagerID
    FROM
       PortfolioSummary
    WHERE
       PortfolioID = 'P9'
   ) T9
UNION
   SELECT
      'ClientId' Info, 2 SortOrder,
      T1.ClientID, T2.ClientID, T3.ClientID, T4.ClientID, T5.ClientID,
      T6.ClientID, T7.ClientID, T8.ClientID, T9.ClientID
   FROM
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P4'
      ) T4,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P5'
      ) T5,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P6'
      ) T6,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P7'
      ) T7,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P8'
      ) T8,
      (SELECT
          ClientID
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P9'
      ) T9
UNION
   SELECT
      'Nb Bonds' Info, 3 SortOrder,
      to_char(T1.NbBonds), to_char(T2.NbBonds), to_char(T3.NbBonds),
      to_char(T4.NbBonds), to_char(T5.NbBonds), to_char(T6.NbBonds),
      to_char(T7.NbBonds), to_char(T8.NbBonds), to_char(T9.NbBonds)
   FROM
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P4'
      ) T4,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P5'
      ) T5,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P6'
      ) T6,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P7'
      ) T7,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P8'
      ) T8,
      (SELECT
          NbBonds
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P9'
      ) T9
UNION
   SELECT
      'NbWarrants' Info, 4 SortOrder,
      to_char(T1.NbWarrants), to_char(T2.NbWarrants),
      to_char(T3.NbWarrants), to_char(T4.NbWarrants),
      to_char(T5.NbWarrants), to_char(T6.NbWarrants),
      to_char(T7.NbWarrants), to_char(T8.NbWarrants),
      to_char(T9.NbWarrants)
   FROM
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P4'
      ) T4,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P5'
      ) T5,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P6'
      ) T6,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P7'
      ) T7,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P8'
      ) T8,
      (SELECT
          NbWarrants
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P9'
      ) T9
UNION
   SELECT
      'NbShares' Info, 5 SortOrder,
      to_char(T1.NbShares), to_char(T2.NbShares), to_char(T3.NbShares),
      to_char(T4.NbShares),
      to_char(T5.NbShares), to_char(T6.NbShares), to_char(T7.NbShares),
      to_char(T8.NbShares), to_char(T9.NbShares)
   FROM
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P1'
      ) T1,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P2'
      ) T2,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P3'
      ) T3,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P4'
      ) T4,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P5'
      ) T5,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P6'
      ) T6,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P7'
      ) T7,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P8'
      ) T8,
      (SELECT
          NbShares
       FROM
          PortfolioSummary
       WHERE
          PortfolioID = 'P9'
      ) T9
/

-- display the resulting transposed view;
column "info" format A11
column "P1" format A5
column "P2" format A5
column "P3" format A5
column "P4" format A5
column "P5" format A5
column "P6" format A5
column "P7" format A5
column "P8" format A5
column "P9" format A5
SELECT
   Info " ", "P1", P2, P3, P4, P5, P6, P7, P8, P9
FROM
   tr_PortfolioSummary
ORDER BY
   SortOrder
/

Here is the result of the above script:

PORTFOLIOI MANAGERID  CLIENTID     NBBONDS NBWARRANTS  NBSHARES
---------- ---------- ---------- --------- ---------- ---------
P1         m1         c1                10         20        30
P2         m1         c1                40         50        60
P3         m2         c1                40         50        60
P4         m2         c2                80         90       100
P5         m2         c3               110        120       130
P6         m3         c3               110        120       130
P7         m3         c4               140        150       160
P8         m4         c5               170        180       190
P9         m5         c5               200        210       220

9 rows selected.

View created.

           P1    P2    P3    P4    P5    P6    P7    P8    P9
---------- ----- ----- ----- ----- ----- ----- ----- ----- -----
ManagerID  m1    m1    m2    m2    m2    m3    m3    m4    m5
ClientId   c1    c1    c1    c2    c3    c3    c4    c5    c5
Nb Bonds   10    40    40    80    110   110   140   170   200
NbWarrants 20    50    50    90    120   120   150   180   210
NbShares   30    60    60    100   130   130   160   190   220

Note the use of type conversion of columns to VARCHAR2 to make sure they are  
type-compatible across rows so the UNION operator can work.
The above query is about 350 lines long, for an original 9-row and 6-column  
table ! If one uses the same query layout (but with the SELECT's list of columns  
on one line only), transposing a general n lines by m columns table into a view  
will need a (7 * n + 4) * (m - 1) line query; applied to a 100 x 10 table, the  
transposition query will be more than 6'300 lines long ! Moreover, if the client  
software used to submit such a query to the server is SQL*PLUS, this tool limits  
the command line length to 2'500 characters; another tool would maybe have  
different limits; this was also a factor that had to be considered when generating  
the query. This illustrates the complexity of SQL-made table transposition, and  
the necessity to resort to a query generator that also takes care of the client's  
limits.

A transposing query generator
-----------------------------
As it was showed in the preceding simple case, it is highly advisable to use an  
automatic tool to generate such SQL queries. Fortunately, due to the highly  
regular structure of the transposing queries, it is fairly easy to write such a  
tool.
Here is a general awk script for building a transposition query. For the scripting  
languages freaks out there, this script does not use any awk idiom (are there  
any except the automatic line input loop ?), so it could be easily converted  
into your favorite scripting language, mostly perl, tcl or Python.
It takes as input a spooled file containing the result of a SELECT primary_key_column  
FROM view_to_transpose (let's name it spooledPKFile) and a spooled DESC  
view_to_transpose file (let's name it spooledDescFile), and insures that the  
generated script's lines stay under the UserLineLength limit. If a QueryViewFile  
is specified then an additional SQL*PLUS script file is produced for querying  
the transposed view. The usage is:

awk -f gen_transpose.awk
    -v PrimaryKeyFile=spooledPKFile
    -v ViewDescFile=spooledDescFile
    -v UserLineLength=some_number  
    -v QueryViewFile=some_file       > transpose.sql

The ordering of the rows in the transposed view is that of the column description  
in spooledDescFile. If a different order is wished, this file should be edited,  
the only restriction is that the first line listed should be the primary key's  
description.
The proposed implementation is basic yet effective, and several improvements  
could be easily added such as formatting options, row naming, collating order,  
etc. Currently, the generated SQL script must be manually edited if the format  
is not suitable.

# Name: gen_transpose.awk;
# Date of creation: January 2000;
# Purpose: awk script to generate a transposing SQL statement;
# usage:
#         awk -f gen_transpose.awk -v PrimaryKeyFile=....
#                                  -v ViewDescFile=....
#                                  -v UserLineLength=0|[0-9]+
#                                  -v QueryViewFile=...
# where PrimaryKeyFile is the file name containing a spool of the primary keys in the view to
# transpose,
# and ViewDescFile is the file name of a text file containing the result of a DESC of the view to
# transpose;  
# PrimaryKeyFile must begin with a line containing the column name, followed by a line of ----,
# followed
# by the primary key values each on its own line, and terminated by a blank line;
# it can be produced by the following Sql*Plus instructions:
#  spool PrimaryKeyFile
#  select primary_key from view_to_transpose;
#  spool off  
# ViewDescFile must begin with 'desc view-to_transpose' on first line, followed by the
# 'Name   Null?   Type' line, followed by a line of '----', then each column name on a line by itself
# and a blank line;
# it can be produced by the following Sql*Plus instructions:
#  spool ViewDesc
#  DESC view_to_transpose
#  spool off  
# first line is assumed to be the primary key description;
# the order of the lines in the DESC file will be used to output the transposed columns;
# if a different order is wanted, the DESC file must be edited;
# emitted SQL code can be executed right away if redirected into a file/piped into Sql*Plus;
# if UserLineLength is non null then the large column list in the SELECT stmt will be wrapped so as
# not to overflow the line width;
# if QueryViewFile is given, a SQL*PLUS file is produced to query the transposed view;
BEGIN {
   # program constants, edit them to suit your needs;
   TransposeViewPrefix = "tr_"
   # in generated query against the produced view;
   ColumnSepBlank = 1

   # name of this current script;
   AwkProgram = "gen_transpose.awk"
   Error = 0

   # read in the view description from ViewDescFile;
   if (1 != (getline < ViewDescFile)) {
      Error = 1;
      print "Error while reading file " ViewDescFile
      exit
   }
   MasterView = $3
   TransposedViewName = TransposeViewPrefix  MasterView

   # skip irrelevant lines;
   while (1 == (getline < ViewDescFile) && !match($0, /^ (-+ *)+$/));  

   # find columns' name, type and max. width;
   NbColumns = 0
   MustCastToChar = 0
   MaxColWidth = 0
   PrimaryKeyWidth = 0
   while (1 == (getline < ViewDescFile) && "" != $0) {
      # allow for spaces in column names (they were defined between double quotes);
      Temp = substr($0, 2, 31)
      gsub(/ +$/, "", Temp)
      ColumnName[NbColumns] = Temp;
      if (length($0) > PrimaryKeyWidth)
         PrimaryKeyWidth = length(Temp)

      IsNumeric = (match($0, /NUMBER(\([0-9]+(,[0-9]+)?\))? *$/) > 0) ? 1 : 0
      ColumnIsNumeric[NbColumns] = IsNumeric
      MustCastToChar += IsNumeric      # |= would be enough but unlike C, awk does not have it;
      
      ColWidth = substr($0, match($0, /\([0-9]+(\)|,)/) + 1, RLENGTH - 1) + 0
      if (ColWidth > MaxColWidth)
         MaxColWidth = ColWidth

      NbColumns++
   }
   close(ViewDescFile)

   # read in the primary keys from PrimaryKeyFile;
   NbPrimaryKeys = 0
   while (1 == (getline < PrimaryKeyFile) && !match($0, /^-+ *$/));    # skip irrelevant lines;
   while (1 == (getline < PrimaryKeyFile) && "" != $0)
      PrimaryKeyTab[NbPrimaryKeys++] = $1
   close(PrimaryKeyFile)
   # any I/O error ?
   if (!NbPrimaryKeys) {
      Error = 1
      print "Error while reading file " PrimaryKeyFile
      print "at line " NbPrimaryKeys
      exit
   }

   # starts emitting SQL;
   print "-- SCRIPT GENERATED BY " AwkProgram ";"
   print "-- do not edit;"
   printf "\n"
   print "-- create the transposition view;"
   print "DROP VIEW " TransposedViewName ";"
   print "CREATE VIEW " TransposedViewName " AS"  

   # following 2 variables control the line wrapping if user has set UserLineLength different than 0;
   CurrentLineLength = 0
   Buffer = ""

   # builds UNIONed blocks, one per column; those are the transposed view's lines;
   for (ColumnIndex = 1; ColumnIndex < NbColumns; ColumnIndex++) {
      print "SELECT"
      WrappedPrintf("   '" ColumnName[ColumnIndex] "' Info, " ColumnIndex " SortOrder")
      # outputs the columns values, one per PK value;
      for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
         WrappedPrintf(sprintf(", %sT%d.%s%s%s",
                               ColumnIsNumeric[ColumnIndex] && MustCastToChar ? "to_char(" : "",
                               KeyIndex + 1,
                               ColumnName[ColumnIndex],
                               ColumnIsNumeric[ColumnIndex] && MustCastToChar ? ")" : "",
                               1 == ColumnIndex ? " \"" PrimaryKeyTab[KeyIndex] "\"" : ""))
      WrappedPrintf("\n")
      print "FROM"
      # in-line views here, one per PK value; they'll be used in the Cartesian product;
      for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++) {
         print "   (SELECT"
         print "       " ColumnName[ColumnIndex]
         print "    FROM"
         print "       " MasterView
         print "    WHERE"
         print "       " ColumnName[0] " = " (ColumnIsNumeric[0] ? "" : "'") PrimaryKeyTab[KeyIndex]
               (ColumnIsNumeric[0] ? "" : "'")
         printf("   ) T%d%s\n", KeyIndex + 1, KeyIndex < NbPrimaryKeys - 1 ? "," : "")
      }
      print (ColumnIndex < NbColumns - 1) ? "UNION" : "/"  
   }

   # create the query against the new view statement to display it;
   print "-- display the resulting transposed view;"
   print "-- SCRIPT GENERATED BY " AwkProgram ";"      > QueryViewFile
   print "-- do not edit;"                             > QueryViewFile
   # avoid multiple page header lines in Sql*Plus;
   print "set pagesize 10000"                          > QueryViewFile
   # force column separator's length;
   printf("set colsep \"%*s\"\n", ColumnSepBlank, " ") > QueryViewFile
   # attempt to avoid linewrapping in Sql*Plus;
   # maximum line length is 32767;
   printf("set linesize %d\n",
          ((Temp = (MaxColWidth + ColumnSepBlank) * NbPrimaryKeys + PrimaryKeyWidth) <= 32767) ?
          Temp : 32767)                                > QueryViewFile
   print "column info format A" PrimaryKeyWidth        > QueryViewFile
   for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
      print "column " PrimaryKeyTab[KeyIndex] " format A" MaxColWidth > QueryViewFile
   print "SELECT"                                      > QueryViewFile
   WrappedPrintf("   Info \" \"", 1)
   for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
      WrappedPrintf(", \"" PrimaryKeyTab[KeyIndex] "\"", 1)
   WrappedPrintf("\n", 1)
   print "FROM"                                        > QueryViewFile
   print "   " TransposedViewName                      > QueryViewFile
   print "ORDER BY"                                    > QueryViewFile
   print "   SortOrder"                                > QueryViewFile
   print "/"                                           > QueryViewFile
}

   # controls that a line is not overflown when UserLineLength is different from 0;
   # modifies globals Buffer and CurrentLineLength;
   # those are side-effects but the program is short and this is the only function in it, so ...;
   function WrappedPrintf(StringToPrint, OutputFile) {
      if (1 == ToQueryViewFile)
         OutputFile = QueryViewFile
      else
         OutputFile = "/dev/stdout"
      if (0 == UserLineLength)                                # no restrictions, print right away;
         printf StringToPrint   > OutputFile
      else if ("\n" != StringToPrint) {                       # flush buffer ?
         Temp = length(StringToPrint)
         if (CurrentLineLength + Temp > UserLineLength) {     # potential overflow ?
            print Buffer > OutputFile                         # yes, flush it before writing into it;
            Buffer = StringToPrint
            CurrentLineLength = Temp
         }
         else {                                               # no overflow, write into the buffer;
            Buffer = Buffer StringToPrint
            CurrentLineLength += Temp
         }
      }
      else {                                                  # flush buffer;
         print Buffer > OutputFile
         Buffer = ""
         CurrentLineLength = 0
      }
   }

As it is visible, the script is surprisingly short, with the bulk of the work being performed by two small nested loops. This is because of the orthogonal structure of the transpose SQL query, as shown in the preceding SQL examples.
Converting to a uniform VARCHAR2 type does work for the basic column datatypes (such as NUMBER and DATE) but is likely to fail with
more exotic ones such as LONGs and LOBs. In these cases, the generator could be modified to only extract a short excerpt from the
character LOB types, and skip the BLOBs. Or such columns could be ignored altogether, or an error/warning message issued. Here,
those datatypes are not processed especially so the transposing statement will probably fail. The DATE datatype will be implicitly converted using whatever NLS_DATE_FORMAT is defined in the client's environment.

阅读(2231) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~