这里有篇行列转换的文章,大家共享一下
[b]
Line-Column Transposition
-------------------------
Introduction
------------
This question was asked once in an Oracle forum on Usenet, and received a very
evasive answer. Previously, I had to solve a similar issue while working for
another company; while the solution worked, it didn't look very convincing, and
it was hard to maintain. Later, I was asked this same question at a job interview.
Finally, we received such an inquiry from a customer at Support too. The proposed
solution was a PL/SQL function very specific to that customer's tables. Recently,
another customer admitted she had to face such an issue too. All this to say that
anybody is likely to encounter this problem at least once in a SQL programmer's
life. That is why I decided to tackle this issue once and for all and to provide
a portable and generic solution for it. Actually, I propose 3 solutions amidst
many other alternatives. Busy people please jump directly to the paragraph Usage
at the end of this article.
When a relational database is queried, the returned data is naturally presented
as a suite of line feed-terminated rows composed of several attribute values
aligned in columns. This tabular form closely matches the underlying relational
model and is therefore easily understood. This kind of representation also looks
like a matrix because it is a two-dimensional, rectangular array of data too.
This naturally leads to think of certain relational operations as matrix ones.
For example, a relational projection is similar to a matrix post-product and a
relational data selection to a matrix transposition followed by a matrix post-product
followed by a matrix transposition or, shorter, a matrix pre-product.
But, conversely, what relational operation does correspond to a matrix transposition
in which lines are converted to rows and rows to lines ?
There is no native relational operator that does this, so this functionality must
be programmed ad hoc. But firstly and most importantly, what is the use for such
transposition, what would be its high-level conceptual meaning ?
To help figure this out, here is a little example. Suppose one has a table with
the following definition: SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,
NbShares), i.e. SecuritiesByPortfolios is a relation containing a summary of client's
portfolios counting the number of securities by type, either bonds, warrants or
shares. The table gives the composition of each portfolio, spread by security
type. The NbBonds column contains the number of bonds one given client has in his
portfolio, and so forth for the NbWarrants and NbShares columns. A symbolic
representation of such a table could look like:
Type of Security NbBonds NbWarrants NbShares
Portfolio id
p1 b1 w1 s1
p2 b2 w2 s2
... ... ... ...
pn bn wn sn
It is worth noting that in the most general case there can be any number of
portfolios, whereas the type of securities is limited to three here. Such a
table generally has an unconstrained number of rows but a finite number of
columns.
Now, let's consider a transposition of such a table, noted as follows:
tr_SecuritiesByPortfolios(TypeOfSecurity, p1, p2, ... pn):
Portfolios P1 p2 ... pn
Type of Security
NbBonds b1 b2 ... bn
NbWarrants w1 w2 ... wn
NbShares s1 s2 ... sn
One can quickly notice that the above table is the same as before but rotated
900 counter-clock side (to be exact, a symmetry along a horizontal axis is also
successively performed on the rotated table). The rows have been turned into
columns, and the columns into rows. It contains exactly the same data as before
but what used to be lines are now columns and the columns have become lines.
As it is quite intuitive here, this new table is another way of looking at the
same data. Here, one wants to know for each type of security how many of them
are held in each existing portfolio. The very same data as before are presented
but from a different point of view which is no less meaningful than the original
one. Emphasis now is put on the type of securities whilst before it was put on
the portfolios. As a consequence of transposition, this table generally has now
an unconstrained number of columns but a finite number of lines.
Although this fact is not visible in the above figure, it is important to note
that after transposition the new columns are generally of mixed type and
therefore such a table would not be strictly relational any more. However, by
using type conversion to strings, it still can be manipulated relationally.
The object of this article is to propose several ways to turn the first table
into the second one, i.e. to transpose data. Here tables are to be understood
in their most general meaning: they can be the result of an SQL query spooled
into a text file or databases' relational tables or views. Since it is
straightforward to go from the latter to the former (SELECT... FROM...) and
only less trivial to go the other way around (must use the SQL*Loader with a
suitable control file), the distinction is not relevant in the present context.
So hereafter, the term table will be used interchangeably to designate either a
relational database table or view, or the spooled data extracted by a query.
Its context should be sufficient to distinguish between each other in case of
need.
Roughly, there are two ways to accomplish a transposition:
1. an internal to the database one which uses SQL language statements to
reorganize the data and directly works upon the database's tables;
2. an external one which processes the data themselves once they've been
extracted by some SQL statement and spooled into a file, typically a
fixed-length flat text file;
The internal method
-------------------
This method can be implemented in two ways; one can either use declarative SQL
statements or imperative, procedural PL/SQL statements. Let's see how they
could look like.
The SQL approach
----------------
Since each selected row potentially contains columns of different types from
other lines, one query per row and a union operator in between are necessary.
Each intermediary query must select one value for each column, but since the
values are each in a different record they must be grouped together - somewhat
artificially - on the same line, and this requires a Cartesian product.
At first sight, the internal method is very appealing because it generates the
transposed table directly from the database server. Thus, the data don't need
to be first extracted and stored on some storage device in order to be processed
later, which is convenient for very large result sets. However, the SQL SELECT
statement's syntax demands that a finite list of column names be given, which is
generally not possible because the number of rows is not known in advance. One
has just to remember that the number of columns in the transposed table is equal
to the number of rows in the original table. Consequently, a first run - but a
simplified one - of the original table (i.e. the one before the transposition)
must be taken in order to have an exhaustive list of columns to feed to the
transposition query's SELECT statement.
In the general case it is impractical to manually type the transposition query
because the number of columns can be arbitrary large, so it must be programmatically
generated. After the query generation, the first run's extracted data is not used
anymore and can thus be deleted.
Let's now see what would the transposition query look like, first with the
preceding example and later with a more general case.
The table to transpose is SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants,
NbShares). One wants it transposed into tr_SecuritiesByPortfolios(TypeOfSecurity,
p1, p2, ... pn), where pi is the number of securities of a given type for portfolio i.
To be even more generic, a view for the tansposed data will be created. This view
could be materialized later into a real table or simply SELECTed against for
displaying the data.
Here is one possible solution:
-- note: italicized IDi are portfolio ids quoted literals
-- and bold IDi are either quoted (if their datatype is VARCHAR) or unquoted
-- portfolio ids literals;
CREATE VIEW
tr_PortfolioSummary
AS SELECT
T1.NbBonds ID1, T2.NbBonds ID2, ..., Tn.NbBonds IDn
FROM
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
ID = ID1
) T1,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
ID = ID2
) T2,
...
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
ID = IDn
) Tn
UNION
SELECT
T1.NbWarrants, T2.NbWarrants, ..., Tn.NbWarrants
FROM
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
ID = ID1
) T1,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
ID = ID2
) T2,
...
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
ID = IDn
) Tn
UNION
SELECT
T1.NbShares, T2.NbShares, ..., Tn.NbShares
FROM
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
ID = ID1
) T1,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
ID = ID2
) T2,
...
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
ID = IDn
) Tn
To show concretely what it looks like, let's apply the preceding method to the
following data:
-- sample table creation;
drop table
PortfolioSummary
/
create table
PortfolioSummary
(
PortfolioID varchar2(10) PRIMARY KEY,
NbBonds number(5),
NbWarrants number(5),
NbShares number(5)
)
/
insert into PortfolioSummary values ('P1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 40, 50, 60);
insert into PortfolioSummary values ('P3', 40, 50, 60);
select * from PortfolioSummary;
-- create the transposed view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
'Nb Bonds' Info, 1 SortOrder, T1.NbBonds "P1", T2.NbBonds "P2",
T3.NbBonds "P3"
FROM
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3
UNION
SELECT
'Nb Warrants' Info, 2 SortOrder, T1.NbWarrants, T2.NbWarrants,
T3.NbWarrants
FROM
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3
UNION
SELECT
'Nb Shares' Info, 3 SortOrder, T1.NbShares, T2.NbShares, T3.NbShares
FROM
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3
/
-- display the resulting transposed view;
column "info" format A11
column "P1" format 99999
column "P2" format 99999
column "P3" format 99999
-- display the view;
SELECT
Info " ", P1, P2, P3
FROM
tr_PortfolioSummary
ORDER BY
SortOrder
/
-- original data;
PORTFOLIOI NBBONDS NBWARRANTS NBSHARES
---------- ---------- ---------- ----------
P1 10 20 30
P2 40 50 60
P3 40 50 60
-- transposed data;
P1 P2 P3
----------- ------ ------ ------
Nb Bonds 10 40 40
Nb Warrants 20 50 50
Nb Shares 30 60 60
Clearly, the query is quite complex and can take some time to execute on large
tables. It contains UNIONs of as many queries as columns in SecuritiesByPortfolios,
and each such query has as many in-line views as rows in that table. Moreover,
the query is extensively based on Cartesian products of single-row in-line
views. To guarantee that those views contain one and only one row, all the
predicates must refer a table's primary key, which is ID here.
The UNION blocks can be reordered if needed, but this can lead to some confusion
in interpreting the transposed data, so it is better to arrange the first row in
the transposed table to be a line of unique ids.
In addition, UNION does not preserve the row order so a sort must be done later,
while selecting for the view. For this reason, a collating column to be sorted
on has been added, through which it is possible to impose the row ordering.
This only slightly complicates the SQL statement.
It is worth reminding that the Oracle server imposes a limit on the number of
columns in a SELECT statement: 1000 for Oracle v8x. Moreover, the maximum size
of an SQL statement is also limited: 64 Kb though this limit is subject to
several factors and is therefore not absolute. These limits constraint the
complexity of the queries that can be transposed, and hence make this approach
unsuitable for large data sets (more than one thousand rows). Still, it is
interesting because the space cost is affordable.
For not so large tables to transpose, the number of columns limit can be reached
easily. Ditto for the 64 Kb SQL text limit. Additionally, manually writing such
a query can be tedious and error-prone, so it is almost mandatory - except in
trivially simple cases - to use a query generator, either an ad-hoc one tailored
for a particular table or a generic one which can process any table. For this example,
such a generator could take as input the spooled data extracted from
SecuritiesByPortfolios(PortfolioId, NbBonds, NbWarrants, NbShares) and output
an SQL query into a text file ready for execution, as shown by the following
invocation (only on UNIX):
awk -f gen_transpose.awk SecuritiesByPortfolios.txt > tr_SecuritiesByPortfolios.sql
Here is an example of such a generator script, a generic one, later.
A more representative example
-----------------------------
Let's consider an example with columns of heterogeneous types. Let PortfolioSummary
have the following structure:
PortfolioSummary(PortfolioID, ManagerID, ClientID, NbBonds, NbWarrants, NbShares).
The ID columns are alphanumeric, the last 3 ones are purely numeric. One could
have thrown in one date column as well, but this is sufficient for this
demonstration purpose.
One wants the transpose of PortfolioSummary, noted tr_PortfolioSummary, with the
following structure:
tr_PortfolioSummary(PortfolioID1, PortfolioID2, ... PortfolioIDn)
whose rows are:
(ManagerID1, ManagerID2, ManagerID3, ... ManagerIDn)
(ClientID11, ClientID2, ClientID3, ..., Clientn)
(NbBonds1, NbBonds2, NbBonds3, ..., NbBondsn)
(NbWarrants1, NbWarrants2, NbWarrants3, ..., NbWarrantsn)
(NbShares1, NbShares2, NbShares3, ..., NbSharesn)
where n is the number of unique portfolios in PortfolioSummary.
Given the simple case example and the preceding considerations, here is a possible
SQL statement:
-- create the master table;
create table
PortfolioSummary
(
PortfolioID varchar2(10) PRIMARY KEY,
ManagerID varchar2(10),
ClientID varchar2(10),
NbBonds number(5),
NbWarrants number(5),
NbShares number(5)
);
-- populate the table;
insert into PortfolioSummary values ('P1', 'm1', 'c1', 10, 20, 30);
insert into PortfolioSummary values ('P2', 'm1', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P3', 'm2', 'c1', 40, 50, 60);
insert into PortfolioSummary values ('P4', 'm2', 'c2', 80, 90, 100);
insert into PortfolioSummary values ('P5', 'm2', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P6', 'm3', 'c3', 110, 120, 130);
insert into PortfolioSummary values ('P7', 'm3', 'c4', 140, 150, 160);
insert into PortfolioSummary values ('P8', 'm4', 'c5', 170, 180, 190);
insert into PortfolioSummary values ('P9', 'm5', 'c5', 200, 210, 220);
-- display the original table;
select * from PortfolioSummary;
-- create the transposition view;
DROP VIEW tr_PortfolioSummary;
CREATE VIEW tr_PortfolioSummary AS
SELECT
'ManagerID' Info, 1 SortOrder,
T1.ManagerID P1, T2.ManagerID P2, T3.ManagerID P3, T4.ManagerID P4,
T5.ManagerID P5, T6.ManagerID P6, T7.ManagerID P7, T8.ManagerID P8,
T9.ManagerID P9
FROM
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P4'
) T4,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P5'
) T5,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P6'
) T6,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P7'
) T7,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P8'
) T8,
(SELECT
ManagerID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P9'
) T9
UNION
SELECT
'ClientId' Info, 2 SortOrder,
T1.ClientID, T2.ClientID, T3.ClientID, T4.ClientID, T5.ClientID,
T6.ClientID, T7.ClientID, T8.ClientID, T9.ClientID
FROM
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P4'
) T4,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P5'
) T5,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P6'
) T6,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P7'
) T7,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P8'
) T8,
(SELECT
ClientID
FROM
PortfolioSummary
WHERE
PortfolioID = 'P9'
) T9
UNION
SELECT
'Nb Bonds' Info, 3 SortOrder,
to_char(T1.NbBonds), to_char(T2.NbBonds), to_char(T3.NbBonds),
to_char(T4.NbBonds), to_char(T5.NbBonds), to_char(T6.NbBonds),
to_char(T7.NbBonds), to_char(T8.NbBonds), to_char(T9.NbBonds)
FROM
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P4'
) T4,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P5'
) T5,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P6'
) T6,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P7'
) T7,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P8'
) T8,
(SELECT
NbBonds
FROM
PortfolioSummary
WHERE
PortfolioID = 'P9'
) T9
UNION
SELECT
'NbWarrants' Info, 4 SortOrder,
to_char(T1.NbWarrants), to_char(T2.NbWarrants),
to_char(T3.NbWarrants), to_char(T4.NbWarrants),
to_char(T5.NbWarrants), to_char(T6.NbWarrants),
to_char(T7.NbWarrants), to_char(T8.NbWarrants),
to_char(T9.NbWarrants)
FROM
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P4'
) T4,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P5'
) T5,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P6'
) T6,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P7'
) T7,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P8'
) T8,
(SELECT
NbWarrants
FROM
PortfolioSummary
WHERE
PortfolioID = 'P9'
) T9
UNION
SELECT
'NbShares' Info, 5 SortOrder,
to_char(T1.NbShares), to_char(T2.NbShares), to_char(T3.NbShares),
to_char(T4.NbShares),
to_char(T5.NbShares), to_char(T6.NbShares), to_char(T7.NbShares),
to_char(T8.NbShares), to_char(T9.NbShares)
FROM
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P1'
) T1,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P2'
) T2,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P3'
) T3,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P4'
) T4,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P5'
) T5,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P6'
) T6,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P7'
) T7,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P8'
) T8,
(SELECT
NbShares
FROM
PortfolioSummary
WHERE
PortfolioID = 'P9'
) T9
/
-- display the resulting transposed view;
column "info" format A11
column "P1" format A5
column "P2" format A5
column "P3" format A5
column "P4" format A5
column "P5" format A5
column "P6" format A5
column "P7" format A5
column "P8" format A5
column "P9" format A5
SELECT
Info " ", "P1", P2, P3, P4, P5, P6, P7, P8, P9
FROM
tr_PortfolioSummary
ORDER BY
SortOrder
/
Here is the result of the above script:
PORTFOLIOI MANAGERID CLIENTID NBBONDS NBWARRANTS NBSHARES
---------- ---------- ---------- --------- ---------- ---------
P1 m1 c1 10 20 30
P2 m1 c1 40 50 60
P3 m2 c1 40 50 60
P4 m2 c2 80 90 100
P5 m2 c3 110 120 130
P6 m3 c3 110 120 130
P7 m3 c4 140 150 160
P8 m4 c5 170 180 190
P9 m5 c5 200 210 220
9 rows selected.
View created.
P1 P2 P3 P4 P5 P6 P7 P8 P9
---------- ----- ----- ----- ----- ----- ----- ----- ----- -----
ManagerID m1 m1 m2 m2 m2 m3 m3 m4 m5
ClientId c1 c1 c1 c2 c3 c3 c4 c5 c5
Nb Bonds 10 40 40 80 110 110 140 170 200
NbWarrants 20 50 50 90 120 120 150 180 210
NbShares 30 60 60 100 130 130 160 190 220
Note the use of type conversion of columns to VARCHAR2 to make sure they are
type-compatible across rows so the UNION operator can work.
The above query is about 350 lines long, for an original 9-row and 6-column
table ! If one uses the same query layout (but with the SELECT's list of columns
on one line only), transposing a general n lines by m columns table into a view
will need a (7 * n + 4) * (m - 1) line query; applied to a 100 x 10 table, the
transposition query will be more than 6'300 lines long ! Moreover, if the client
software used to submit such a query to the server is SQL*PLUS, this tool limits
the command line length to 2'500 characters; another tool would maybe have
different limits; this was also a factor that had to be considered when generating
the query. This illustrates the complexity of SQL-made table transposition, and
the necessity to resort to a query generator that also takes care of the client's
limits.
A transposing query generator
-----------------------------
As it was showed in the preceding simple case, it is highly advisable to use an
automatic tool to generate such SQL queries. Fortunately, due to the highly
regular structure of the transposing queries, it is fairly easy to write such a
tool.
Here is a general awk script for building a transposition query. For the scripting
languages freaks out there, this script does not use any awk idiom (are there
any except the automatic line input loop ?), so it could be easily converted
into your favorite scripting language, mostly perl, tcl or Python.
It takes as input a spooled file containing the result of a SELECT primary_key_column
FROM view_to_transpose (let's name it spooledPKFile) and a spooled DESC
view_to_transpose file (let's name it spooledDescFile), and insures that the
generated script's lines stay under the UserLineLength limit. If a QueryViewFile
is specified then an additional SQL*PLUS script file is produced for querying
the transposed view. The usage is:
awk -f gen_transpose.awk
-v PrimaryKeyFile=spooledPKFile
-v ViewDescFile=spooledDescFile
-v UserLineLength=some_number
-v QueryViewFile=some_file > transpose.sql
The ordering of the rows in the transposed view is that of the column description
in spooledDescFile. If a different order is wished, this file should be edited,
the only restriction is that the first line listed should be the primary key's
description.
The proposed implementation is basic yet effective, and several improvements
could be easily added such as formatting options, row naming, collating order,
etc. Currently, the generated SQL script must be manually edited if the format
is not suitable.
# Name: gen_transpose.awk;
# Date of creation: January 2000;
# Purpose: awk script to generate a transposing SQL statement;
# usage:
# awk -f gen_transpose.awk -v PrimaryKeyFile=....
# -v ViewDescFile=....
# -v UserLineLength=0|[0-9]+
# -v QueryViewFile=...
# where PrimaryKeyFile is the file name containing a spool of the primary keys in the view to
# transpose,
# and ViewDescFile is the file name of a text file containing the result of a DESC of the view to
# transpose;
# PrimaryKeyFile must begin with a line containing the column name, followed by a line of ----,
# followed
# by the primary key values each on its own line, and terminated by a blank line;
# it can be produced by the following Sql*Plus instructions:
# spool PrimaryKeyFile
# select primary_key from view_to_transpose;
# spool off
# ViewDescFile must begin with 'desc view-to_transpose' on first line, followed by the
# 'Name Null? Type' line, followed by a line of '----', then each column name on a line by itself
# and a blank line;
# it can be produced by the following Sql*Plus instructions:
# spool ViewDesc
# DESC view_to_transpose
# spool off
# first line is assumed to be the primary key description;
# the order of the lines in the DESC file will be used to output the transposed columns;
# if a different order is wanted, the DESC file must be edited;
# emitted SQL code can be executed right away if redirected into a file/piped into Sql*Plus;
# if UserLineLength is non null then the large column list in the SELECT stmt will be wrapped so as
# not to overflow the line width;
# if QueryViewFile is given, a SQL*PLUS file is produced to query the transposed view;
BEGIN {
# program constants, edit them to suit your needs;
TransposeViewPrefix = "tr_"
# in generated query against the produced view;
ColumnSepBlank = 1
# name of this current script;
AwkProgram = "gen_transpose.awk"
Error = 0
# read in the view description from ViewDescFile;
if (1 != (getline < ViewDescFile)) {
Error = 1;
print "Error while reading file " ViewDescFile
exit
}
MasterView = $3
TransposedViewName = TransposeViewPrefix MasterView
# skip irrelevant lines;
while (1 == (getline < ViewDescFile) && !match($0, /^ (-+ *)+$/));
# find columns' name, type and max. width;
NbColumns = 0
MustCastToChar = 0
MaxColWidth = 0
PrimaryKeyWidth = 0
while (1 == (getline < ViewDescFile) && "" != $0) {
# allow for spaces in column names (they were defined between double quotes);
Temp = substr($0, 2, 31)
gsub(/ +$/, "", Temp)
ColumnName[NbColumns] = Temp;
if (length($0) > PrimaryKeyWidth)
PrimaryKeyWidth = length(Temp)
IsNumeric = (match($0, /NUMBER(\([0-9]+(,[0-9]+)?\))? *$/) > 0) ? 1 : 0
ColumnIsNumeric[NbColumns] = IsNumeric
MustCastToChar += IsNumeric # |= would be enough but unlike C, awk does not have it;
ColWidth = substr($0, match($0, /\([0-9]+(\)|,)/) + 1, RLENGTH - 1) + 0
if (ColWidth > MaxColWidth)
MaxColWidth = ColWidth
NbColumns++
}
close(ViewDescFile)
# read in the primary keys from PrimaryKeyFile;
NbPrimaryKeys = 0
while (1 == (getline < PrimaryKeyFile) && !match($0, /^-+ *$/)); # skip irrelevant lines;
while (1 == (getline < PrimaryKeyFile) && "" != $0)
PrimaryKeyTab[NbPrimaryKeys++] = $1
close(PrimaryKeyFile)
# any I/O error ?
if (!NbPrimaryKeys) {
Error = 1
print "Error while reading file " PrimaryKeyFile
print "at line " NbPrimaryKeys
exit
}
# starts emitting SQL;
print "-- SCRIPT GENERATED BY " AwkProgram ";"
print "-- do not edit;"
printf "\n"
print "-- create the transposition view;"
print "DROP VIEW " TransposedViewName ";"
print "CREATE VIEW " TransposedViewName " AS"
# following 2 variables control the line wrapping if user has set UserLineLength different than 0;
CurrentLineLength = 0
Buffer = ""
# builds UNIONed blocks, one per column; those are the transposed view's lines;
for (ColumnIndex = 1; ColumnIndex < NbColumns; ColumnIndex++) {
print "SELECT"
WrappedPrintf(" '" ColumnName[ColumnIndex] "' Info, " ColumnIndex " SortOrder")
# outputs the columns values, one per PK value;
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
WrappedPrintf(sprintf(", %sT%d.%s%s%s",
ColumnIsNumeric[ColumnIndex] && MustCastToChar ? "to_char(" : "",
KeyIndex + 1,
ColumnName[ColumnIndex],
ColumnIsNumeric[ColumnIndex] && MustCastToChar ? ")" : "",
1 == ColumnIndex ? " \"" PrimaryKeyTab[KeyIndex] "\"" : ""))
WrappedPrintf("\n")
print "FROM"
# in-line views here, one per PK value; they'll be used in the Cartesian product;
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++) {
print " (SELECT"
print " " ColumnName[ColumnIndex]
print " FROM"
print " " MasterView
print " WHERE"
print " " ColumnName[0] " = " (ColumnIsNumeric[0] ? "" : "'") PrimaryKeyTab[KeyIndex]
(ColumnIsNumeric[0] ? "" : "'")
printf(" ) T%d%s\n", KeyIndex + 1, KeyIndex < NbPrimaryKeys - 1 ? "," : "")
}
print (ColumnIndex < NbColumns - 1) ? "UNION" : "/"
}
# create the query against the new view statement to display it;
print "-- display the resulting transposed view;"
print "-- SCRIPT GENERATED BY " AwkProgram ";" > QueryViewFile
print "-- do not edit;" > QueryViewFile
# avoid multiple page header lines in Sql*Plus;
print "set pagesize 10000" > QueryViewFile
# force column separator's length;
printf("set colsep \"%*s\"\n", ColumnSepBlank, " ") > QueryViewFile
# attempt to avoid linewrapping in Sql*Plus;
# maximum line length is 32767;
printf("set linesize %d\n",
((Temp = (MaxColWidth + ColumnSepBlank) * NbPrimaryKeys + PrimaryKeyWidth) <= 32767) ?
Temp : 32767) > QueryViewFile
print "column info format A" PrimaryKeyWidth > QueryViewFile
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
print "column " PrimaryKeyTab[KeyIndex] " format A" MaxColWidth > QueryViewFile
print "SELECT" > QueryViewFile
WrappedPrintf(" Info \" \"", 1)
for (KeyIndex = 0; KeyIndex < NbPrimaryKeys; KeyIndex++)
WrappedPrintf(", \"" PrimaryKeyTab[KeyIndex] "\"", 1)
WrappedPrintf("\n", 1)
print "FROM" > QueryViewFile
print " " TransposedViewName > QueryViewFile
print "ORDER BY" > QueryViewFile
print " SortOrder" > QueryViewFile
print "/" > QueryViewFile
}
# controls that a line is not overflown when UserLineLength is different from 0;
# modifies globals Buffer and CurrentLineLength;
# those are side-effects but the program is short and this is the only function in it, so ...;
function WrappedPrintf(StringToPrint, OutputFile) {
if (1 == ToQueryViewFile)
OutputFile = QueryViewFile
else
OutputFile = "/dev/stdout"
if (0 == UserLineLength) # no restrictions, print right away;
printf StringToPrint > OutputFile
else if ("\n" != StringToPrint) { # flush buffer ?
Temp = length(StringToPrint)
if (CurrentLineLength + Temp > UserLineLength) { # potential overflow ?
print Buffer > OutputFile # yes, flush it before writing into it;
Buffer = StringToPrint
CurrentLineLength = Temp
}
else { # no overflow, write into the buffer;
Buffer = Buffer StringToPrint
CurrentLineLength += Temp
}
}
else { # flush buffer;
print Buffer > OutputFile
Buffer = ""
CurrentLineLength = 0
}
}
As it is visible, the script is surprisingly short, with the bulk of the work being performed by two small nested loops. This is because of the orthogonal structure of the transpose SQL query, as shown in the preceding SQL examples.
Converting to a uniform VARCHAR2 type does work for the basic column datatypes (such as NUMBER and DATE) but is likely to fail with
more exotic ones such as LONGs and LOBs. In these cases, the generator could be modified to only extract a short excerpt from the
character LOB types, and skip the BLOBs. Or such columns could be ignored altogether, or an error/warning message issued. Here,
those datatypes are not processed especially so the transposing statement will probably fail. The DATE datatype will be implicitly converted using whatever NLS_DATE_FORMAT is defined in the client's environment.