Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1419362
  • 博文数量: 416
  • 博客积分: 13005
  • 博客等级: 上将
  • 技术积分: 3297
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-05 16:26
文章分类

全部博文(416)

文章存档

2014年(1)

2013年(4)

2012年(46)

2011年(64)

2010年(12)

2009年(4)

2008年(40)

2007年(187)

2006年(58)

分类: 数据库开发技术

2007-03-27 09:43:26

The PL/SQL approach
-------------------
The procedural approach is based on PL/SQL but could use Java since this language is available internally as well.
It consists in iterative statements successively fetching and returning the values for every column and every row in the table to transpose, instead of a unique, huge and monolithic SQL statement.
The proposed implementation claims to be generic enough to be able to transpose any table whose name is passed as parameter. As
such, it must use dynamic SQL because database objects such as view and column names are resolved at compile-time in PL/SQL, not
at run-time. Usually, dynamic SQL is usable through the dbms_sql package but Oracle 8i extends it to native dynamic SQL through
the EXECUTE IMMEDIATE SQL statement, which greatly simplifies dynamic SQL. Note that this new SQL instruction takes as a parameter a string containing SQL or PL/SQL statements. This string is currently generated by the enclosing procedure.
As the listing below shows it, the code is not very readable because most of it is devoted to building up the dynamic statement
string by piecewise concatenation. Furthermore, it constantly switches between both scopes, the procedure's one and the dynamic
SQL one. These scopes are distinct but parameter passing between them is possible albeit limited to the simple Oracle datatypes,
which makes things quite laborious. Another alternative is to create a script that would generate the whole procedural code targeted
at the concerned view (similarly to gen_transpose.awk and the generated SQL query), and this would probably be a better solution.
The procedure outputs the transposed data into a text file through the utl_file package, but could be modified to create a destination
table to be populated. Outputting to stdout is not a viable alternative because the dbms_output package - in its current implementation based on a buffer flushed once at program conclusion - is not suitable to print large quantity of text.
Currently, the procedure is encapsulated into a SQL*PLUS script that creates it, calls it and corrects one deficiency of utl_file.
In effect, this package also uses a buffer but this time it can contain one line of text up to 32 Kb long. In our case this is far
too limitative. To work around this restriction, the file gets closed and reopened in append mode. Strangely enough, the fflush()
function does not reset the buffer but limits itself to committing the buffer's content into the associated file, so it does not
cure the problem. However, the package shows another peculiarity: when the file is closed, a line-feed is appended to the current
line (I did not test it but maybe this is done only to non-terminated text lines), which again is not good in our case because it
splits logical lines. The work-around is to output some conventional and unlikely character sequence (such as '|||') before the
file gets closed because of quirk #1. This helps make the distinction between physical line ends and logical line ends. But what
if a physical line's end is also a logical line's end ? Won't the output consist in only one huge line ? To avoid this, one procedure's
parameter sets the number of output column values before the file is closed because of quirk #2. Just make sure this number is not
a divisor of the number of values in a logical line. Once the transposition has completed, those sequences are filtered out and
the physical lines belonging to the same logical lines are joined back. The filtering is done through an awk one-liner but sed could be used as well.
The procedure also compacts, but does not justify, the data. The code which computes the optimal column width is quite interesting
because it uses a dynamic SQL statement embedded in the surrounding dynamic SQL statement. This explains why it contains 8 consecutive apostrophes. Needless to say, that piece of code is not precisely readable but I could found no other clearer solution.
Another goodie to implement is an optional creation of the transposed table - a transposed view cannot be created in procedural SQL because the view's text must be pure SQL.
Here is a listing of the procedural solution:

[/b]
[b]
-- Name: transpose_proc.sql;
-- Date of Creation: March 2000;
-- Author: Cesare Cervini, Oracle OSS;
-- Mailto: [email]ccervini@ch..com[/email];
-- Purpose: Transpose a view, procedural version;
-- Usage:
-- sqlplus user/password @transpose_proc
-- and give values to the prompted ViewName PKName OutputDir and OutputFile
--
-- uses utl_file package for file I/O;
-- note that due to a stupid limitation in this package, the output file is closed and re-opened after
-- a certain number of calls to utl_file.put() (see usage of MAX_PUT_BEFORE_CLOSE in dynamic statement string);
-- this is to avoid line size overflow (utl_file.fflush() flushes the buffer but does not reset it);
-- furthermore, due to another pecularity of this package consisting in demanding that lines be terminated a
-- newline, logical records end up being split in several physical records, so that they must be pasted
-- together after the file has been closed for real; this is done by the one-line awk script at the end of the
-- present script;
-- to distinguish between logical and physical records, a special separator string is used;
-- the DEFINE SPLIT_CHAR is the separator between physical records which belong to the same logical record;
-- it is removed once the physical records are merged;

SET SERVEROUTPUT ON

-- Transpose the view ViewName with primary key PKName and outputs the result into file OutputFile in OutputDir;
-- OutputDir must have access rights;
-- the procedure is self-contained in that it does not depend on SQL*PLUS environment;
-- MAX_PUT_BEFORE_CLOSE is the number of fields output between 2 file closings
-- the higher the better, but decrease it if the following error is visible:
/*
ERROR at line 1:
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at "SYS.UTL_FILE", line 103
ORA-06512: at "SYS.UTL_FILE", line 221
ORA-06512: at line 1
ORA-06512: at ".TRANSPOSE", line 131
ORA-06512: at line 1
*/

CREATE OR REPLACE PROCEDURE Transpose(ViewName IN STRING,
                                      PKName IN STRING,
                                      OutputDir IN STRING,
                                      OutputFile IN STRING,
                                      Compact IN BOOLEAN := FALSE,
                                      MAX_PUT_BEFORE_CLOSE NATURAL := 100,
                                      SPLIT_CHAR IN STRING := '|||') IS
   TYPE VIEW_CURSOR IS REF CURSOR;
   ColumnDescCurs VIEW_CURSOR;  
   ColumnDesc user_tab_columns%ROWTYPE;

   -- formating parameters;
   UnsupportedDataType EXCEPTION;
   MaxColNameWidth     NATURAL := 0;
   ColWidth            NATURAL;
   MaxColWidth         NATURAL := 0;
   MustCompact         NATURAL;

   OutputFileH utl_file.file_type;
   Stmt VARCHAR2(5000);
BEGIN
   dbms_output.enable(1000000);
   -- determine the formating parameters;
   OPEN
      ColumnDescCurs
   FOR
      'SELECT column_name, data_type, data_length, data_precision FROM user_tab_columns WHERE table_name = ''' ||
       upper(ViewName) || '''';
   LOOP
      FETCH
         ColumnDescCurs
      INTO
         ColumnDesc.column_name, ColumnDesc.data_type, ColumnDesc.data_length, ColumnDesc.Data_precision;
      EXIT WHEN ColumnDescCurs%NOTFOUND;
      IF MaxColNameWidth < length(ColumnDesc.column_name) THEN
         MaxColNameWidth := length(ColumnDesc.column_name);
      END IF;

      -- do this only when no compaction is demanded;
      IF NOT Compact THEN
         IF ColumnDesc.data_type = 'VARCHAR2' OR
            ColumnDesc.data_type = 'VARCHAR' OR
            ColumnDesc.data_type = 'CHAR' THEN
            ColWidth := ColumnDesc.data_length;
         ELSIF ColumnDesc.data_type = 'NUMBER' THEN
            ColWidth := ColumnDesc.data_precision;
         ELSIF ColumnDesc.data_type = 'DATE' THEN
            -- determine length of displayed date according to current NLS_DATE_FORMAT in session;
            DECLARE  
               LengthDate NATURAL := length(sysdate);
            BEGIN
               ColWidth := LengthDate;
            END;
         ELSE
            raise UnsupportedDataType;
         END IF;
         IF ColWidth > MaxColWidth THEN
            MaxColWidth := ColWidth;
         END IF;
         MustCompact := 0;
      ELSE
         MustCompact := 1;
      END IF;
   END LOOP;
   CLOSE ColumnDescCurs;

   -- open output file;
   OutputFileH := utl_file.fopen(OutputDir, OutputFile, 'W', 32767);

   -- output the transposed view;
   OPEN
      ColumnDescCurs
   FOR
      'SELECT column_name, data_type FROM user_tab_columns WHERE table_name = ''' || upper(ViewName) || '''';
   LOOP
      FETCH ColumnDescCurs INTO ColumnDesc.column_name, ColumnDesc.data_type;
      EXIT WHEN ColumnDescCurs%NOTFOUND;
      -- dynamic statement generation;
      Stmt := 'DECLARE' ||
              '   ViewName STRING(50) := ''' || upper(ViewName) || ''';' ||  
              '   PKName STRING(50) := ''' || PKName || ''';'            ||  
              '   Value ' || upper(ViewName) || '%ROWTYPE;'              ||
              '   NbIOs NATURAL := 0;'                                   ||
              '   OutputFileH utl_file.file_type;'                       ||
              '   OptColWidth NATURAL := :MaxColWidth;'                  ||
              '   CheckLength BOOLEAN := (:MustCompact = 1); '           ||
              'BEGIN '                                                   ||
              '   OutputFileH.id := :FileHandle;'                        ||
              '   utl_file.put(OutputFileH, '''                          ||  
                               substr(rpad(ColumnDesc.column_name, MaxColNameWidth, ' '), 1, MaxColNameWidth) || ''');' ||
              '   FOR R in (SELECT ' || PKName || ' FROM ' || ViewName || ' ORDER BY ' || PKName || ') LOOP'            ||
              '      IF CheckLength THEN'                  ||
              '         OptColWidth := 0;'                 ||
              '         <>'                  ||
              '         DECLARE'                           ||
              '            UnsupportedDataType EXCEPTION;' ||
              '            ColWidth NATURAL;'              ||
              '            PKValue ' || ViewName || '.' || PKName || '%TYPE := R.' || PKName || ';' ||
              '         BEGIN'                                                                      ||
              '            FOR K in (SELECT'                                                        ||
              '                         column_name, data_type, data_length, data_precision'        ||
              '                      FROM'                                                          ||
              '                         user_tab_columns'                                           ||
              '                      WHERE'                                                         ||
              '                         table_name = ''' || upper(ViewName) || ''') LOOP'           ||
              '               IF K.data_type = ''VARCHAR2'' OR'                                     ||
              '                  K.data_type = ''VARCHAR'' OR'                                      ||
              '                  K.data_type = ''CHAR'' THEN'                                       ||
' EXECUTE IMMEDIATE ''SELECT length('' || K.column_name || '') FROM '' || ViewName || '' WHERE '' || PKName || '' = '''''' || PKValue || '''''''' INTO ColWidth;'                                                                 ||
              '               ELSIF K.data_type = ''NUMBER'' THEN'                                  ||
              '                  ColWidth := K.data_precision;'                                     ||
              '               ELSIF K.data_type = ''DATE'' THEN'                                    ||
              '                  DECLARE'                                                           ||
              '                     LengthDate NATURAL := length(sysdate);'                         ||
              '                  BEGIN'                                                             ||
              '                     ColWidth := LengthDate;'                                        ||
              '                  END;'                                                              ||
              '               ELSE'                                                                 ||
              '                  raise UnsupportedDataType;'                                        ||
              '               END IF;'                                                              ||
              '               IF OptColWidth < ColWidth THEN'                                       ||
              '                  OptColWidth := ColWidth;'                                          ||
              '               END IF;'                                                              ||
              '            END LOOP;'                                                               ||
              '         END FindOptWidth;'                                                          ||
              '      END IF;'                                                                       ||
              '      SELECT '                                                                       ||
                        ColumnDesc.column_name                                                      ||
              '      INTO Value.' || ColumnDesc.column_name                                         ||
              '      FROM ' || ViewName || ' WHERE ' || PKName || ' = R.' || PKName || ';'          ||
              '      IF CheckLength AND instr(''' || ColumnDesc.data_type || ''', ''CHAR'') > 0 THEN'             ||
              '         Value.' || ColumnDesc.column_name || ' := rtrim(Value.' || ColumnDesc.column_name || ');' ||
                        'IF Value.' || ColumnDesc.column_name || ' IS NULL THEN Value.'                           ||
                            ColumnDesc.column_name || ' := '' '';'                                                ||
                        'ELSIF Value.' || ColumnDesc.column_name || ' = '''' THEN Value.'                         ||
                            ColumnDesc.column_name || ' := '' ''; END IF;'                                        ||
              '      END IF; '                                                                                    ||
              '      utl_file.put(OutputFileH, '' '' || rpad(Value.'                                              ||
                                                             ColumnDesc.column_name                               ||
                                                             ', OptColWidth, '' ''));'                            ||
              '      NbIOs := NbIOs + 1;'                                                                         ||
              '      IF NbIOs = :MAX_PUT_BEFORE_CLOSE THEN'                                                       ||
              '         NbIOs := 0;'                                                                              ||
              '         utl_file.put(OutputFileH, ''' || SPLIT_CHAR || ''');'                                     ||
              '         utl_file.fclose(OutputFileH);'                                                            ||
              '         OutputFileH := utl_file.fopen(''' || OutputDir                                            ||
                                                      ''', ''' || OutputFile || ''', ''A'', 32767);'              ||
              '      END IF;'                                                                                     ||
              '   END LOOP;'                                                                                      ||
              '   utl_file.new_line(OutputFileH); '                                                               ||
              'END;';
      -- note the hack (accessing OutputFileH.id and CASTing); this is possible because file_type is not an opaque type;
      -- unlike Ada, PL/SQL does not support them, which allows us to by-pass the language limitation with bind variables'
      -- type;
      -- note also that bind variables are positional, not named, so the list order matters;
      EXECUTE IMMEDIATE Stmt USING MaxColWidth, MustCompact, CAST(OutputFileH.id AS NUMBER), MAX_PUT_BEFORE_CLOSE;
   END LOOP;
   utl_file.fclose(OutputFileH);
   CLOSE ColumnDescCurs;
END;
/
show error

-- testing:
SET TIMING ON
exec Transpose('&&ViewName', '&&PKName', '&&OutputDir', '&&OutputFile..tmp', true);
SET TIMING OFF

-- remove SPLIT_CHAR at end of line and concatenate together the physical  
-- lines which belong to the same logical record;
!gawk '{if (gsub(/\|\|\|$/, "", $0)) printf $0; else print}' '&&OutputDir/&&OutputFile..tmp' > &&OutputDir/&&OutputFile
!rm '&&OutputDir/&&OutputFile..tmp'

Using PL/SQL to generate such a large dynamic statement string is a painful  
experience. In particular, testing was quite cumbersome. Moreover, because of  
the intrinsic quirks of outputting text from within PL/SQL, one has better to  
use Pro*C or C + OCI, or even better some higher-level script language with  
Oracle connectivity (such as Perl + oraperl, or Python) for implementing the  
procedural solution. Yet another solution would be to generate the stored  
procedure's code the same way as gen_transpose.awk.

The external method
-------------------
This approach is the most straightforward but is more costly in terms of disk  
real estate. It requires that the table to transpose have been first queried  
into a spool file. Obviously, as this file must be stored on a disk there must  
be enough disk space for it. Also, since the transposed table ends up coexisting  
with the input file, albeit transiently, as a first approximation twice the size  
of the table to transpose must be available on disk. Admittedly, the transposed  
table output can be compressed on the fly to reduce the space requirement.
The simple algorithm implemented in the following example awk script simply  
reads the whole file, line by line, and outputs the data column by column to  
the standard output device. Currently, it reads the file as many times as there  
are columns, which could be substantially improved had one cached several  
columns at once. But this would not be generic enough since caching one single  
column - and thus dividing by 2 the number of passes through the file - may not  
even be possible if the input file is very large.
Here is a possible implementation of a transposing script. Its usage is:

awk -f transpose.awk -v NaturalJustification=0|1 -v CompactColumns=0|1 spooled_file   

See the script's header for an explanation of the command-line parameters.

# Name: transpose.awk;
# Date of creation: January 2000;
# Purpose: takes as input a file containing a table as spooled by Sql*Plus and
# transposes it;
#    typically the spooled file is produced by the following Sql*Plus statements:
#       spool spooled_file
#       SELECT statement
#       spool off
# Usage:
#  awk -f transpose.awk -v NaturalJustification=0|1 -v CompactColumns=0|1 spooled_file   
# if NaturalJustification is 1 then numeric fields are right-justified and char fields are
# left-justified,
# otherwise (default) everything is left-justified;
# NaturalJustification = 0 generally yields better-looking tables;
# CompactColumns is the degree of compactness (i.e. leading blank suppression) when
# outputting the data;
# if CompactColumns is 0: no leading blank suppression, column width is fixed and set to
# the same value as in the input file;
# if CompactColumns is 1: the column width is that of the largest value currently found in
# the row;
BEGIN {
   # skip irrelevant lines in spooled file;
   while (1 == (getline) && "" != $0);

   # save column header line, to be processed later;
   getline
   HeaderLine = $0

   # get line of dashes;
   getline

   # determine the fields' length for justification;
   # extract the column headers, they will turn into line header after transposition;
   NbFields = NF
   match($0, / +/)
   SepWidth = RLENGTH
   MaxColumnWidth = 0
   MaxHeaderWidth = 0
   NextFieldStartPos = 1;
   for (i = 1; i <= NF; i++) {
      ColWidth[i]  = length($i)
      ColHeader[i] = substr(HeaderLine, NextFieldStartPos, ColWidth[i])
      gsub(/^ +/, "", ColHeader[i])
      gsub(/ +$/, "", ColHeader[i])
      NextFieldStartPos += ColWidth[i] + SepWidth
      if (ColWidth[i] > MaxColumnWidth)
         MaxColumnWidth = ColWidth[i]

      Temp = length(ColHeader[i])
      if (Temp > MaxHeaderWidth)
         MaxHeaderWidth = Temp
   }  

   # first line of data;
   getline
   NbLines = 1

   # find out the justification mode for each column;
   NextFieldStartPos = 1;
   for (i = 1; i <= NbFields; i++) {
      LRJustified[i] = (" " == substr($0, NextFieldStartPos, 1)) ? 0 : 1
      NextFieldStartPos += ColWidth[i] + SepWidth
   }

   # no header for column of line headers;
   printf("%*s", MaxHeaderWidth, " ")

   # output first line's first column of data;
   OptWidth = FindOptColWidthInRow()
   PrintJustified(substr($0, 1, ColWidth[1]), LRJustified[1], OptWidth, SepWidth)
}

{
   # first pass through the file: print first column;
   # process following data lines until empty separation line;
   if ("" != $0) {
      OptWidth = FindOptColWidthInRow()
      PrintJustified(substr($0, 1, ColWidth[1]), LRJustified[1], OptWidth, SepWidth)
      #DashedSepLine = DashedSepLine ReplicateChar(" ", SepWidth) ReplicateChar("-",
                                                                                OptWidth)
      NbLines++
   }
   else
      exit
}

END {
   printf "\n"

   SpooledFile =  FILENAME

   # next passes for 2nd, 3rd, etc. columns;
   NextFieldStartPos = 1
   for (i = 1; i <= NbFields; i++) {
      while (1 == (getline < SpooledFile) && 1 != match($0, /^(-+ +)+$/));
      printf("%-*s", MaxHeaderWidth, (i > 1) ? ColHeader[i] : " ")
      while (1 == (getline < SpooledFile) && "" != $0) {
         OptWidth = FindOptColWidthInRow()
         if (i > 1)
            PrintJustified(substr($0, NextFieldStartPos, ColWidth[i]), LRJustified[i],
                           OptWidth, SepWidth)
         else
            # print dashed separation line;
            printf("%*s%s", SepWidth, " ", ReplicateChar("-", OptWidth))
      }
      printf "\n"
      close(SpooledFile)
      NextFieldStartPos += ColWidth[i] + SepWidth
   }
}

   function PrintJustified(Value, LRJustified, FieldWidth, SepWidth) {
      gsub(/^ +/, "", Value)
      gsub(/ +$/, "", Value)
      if (1 == LRJustified || 0 == NaturalJustification)
         printf("%*s%-*s", SepWidth, " ", FieldWidth, Value)
      else
         printf("%*s%*s", SepWidth, " ", FieldWidth, Value)
   }

   function ReplicateChar(CharToRepeat, NbChars,     i, S) {
      for (i = 1; i <= NbChars; i++)
         S = S CharToRepeat
      return S
   }

   # find the largest column in the original table;
   # it will be the column width in the transposed table;
   # this function is called for each row; not very time efficient but allows to be
   # independent from the number of rows in the
   # input data size;
   function FindOptColWidthInRow(     NextFieldStartPos, i, TempS, TempI,
                                      CompactedColumnWidth) {
      if (0 == CompactColumns)  
         return MaxColumnWidth
      else {
         NextFieldStartPos = 1
         CompactedColumnWidth = 0
         for (i = 1; i <= NbFields; i++) {
            TempS = substr($0, NextFieldStartPos, ColWidth[i])
            gsub(/^ +/, "", TempS)
            gsub(/ +$/, "", TempS)
            TempI = length(TempS)
            if (TempI > CompactedColumnWidth)
               CompactedColumnWidth = TempI
            NextFieldStartPos += ColWidth[i] + SepWidth
         }
         return CompactedColumnWidth
      }
   }

Here is an example of its output with the CompactColumns set to 1:

                 1                2                      3               4
                 ---------------- ---------------------- --------------- ---------------
COMPANY_NAME     company_name     1-2-3 COMPUTER SUPPLER 1-800 POSTCARDS 1 800 US SEARCH
ADDRESS          address
CITY             city             OVERLAND PARK          NEW YORK        BEVERLY HILLS
STATE            state            KS                     NY              CA
ZIP              zip              66219                  10010           90210
PHONE            phone
COMPANY_ACTIVITY company_activity ELECTRONIC             MISCELLANEOUS   INFORMATION

As is obvious, this setting produces a much more readable result.

Tests
-----
Internal solution, SQL approach
-------------------------------
Test confirmed that the internal solution is very hungry on memory resources.  
The SGA has better to be very large otherwise the query fails with an  
"ora-4031: unable to allocate %d bytes of shared memory". If such error happens,  
then just give a boost to the shared_pool_size, the larger the better. On  
memory-constrained systems, the swap size needs sometimes to be increased too.
However, even if the SGA is large enough, another error can pop up but while  
displaying the generated view: "ora-1467: sort key too long". I couldn't get  
rid of it in this case by suppressing the obvious sort clause in the view's  
SELECT statement, probably because of the many UNION clauses. The recommended  
solution - admittedly not a realistic one in a production database because it  
needs to be recreated - is to increase the db_block_size. I got this error with  
a 2Kb db_block_size - a conservative value - on a test database. With an 8-Kb or  
even a 16-Kb db_block_size, things should be different.
On a Oracle EE server v8.0.5 2-Kb db_block_size 96 Mb Sparc-20 Solaris system  
I could transpose a 250 rows by 7 columns table using the internal solution's  
generated SQL statement. In a v8.0.5 8-Kb db_block_size database, a 500-row table  
could be transposed successfully, but a 990 one failed. Execution times were  
quite long on that toy workstation but could be improved if some memory was  
added. On a Linux Suze 6.3 with Oracle EE server v8.1.5.0.2, 300 MHz Pentium  
II-MMX, 128 Mb of RAM, 2-Kb db_block_size database, the 998-rows table could be  
transposed successfully but it took a lot of time (more than 3 hours, with the  
CPU hogged at close to 100% all the time) and memory (150 Mb of shared_pool_size,  
which forced to add a large swapfile). Note that the practical maximum row number  
of a table to transpose using the internal method is not 1000 but 998 rows,  
because 2 columns are added by the script to the transposed table - one column  
for collating purpose and one column for identifying the rows. This can easily  
be fixed though (add a dummy row to the table with the lowest key and containing  
the column names so that the identifying column can be suppressed or give up  
the ordering feature but the query generator or the generated query would need  
to be modified).

Internal solution, procedural approach
--------------------------------------
This solution is extremely fast. On the same Linux configuration, the whole  
28'886 rows x 7 columns table could be transposed without compaction in slightly  
more than 5 minutes. In fact, this is grossly equivalent to directly fetching  
about 210'000 values from that table, without the overhead of sub-querying,  
joining or sorting. Thus, this solution is appealing for transposing large  
tables. When compaction is turned on, however, execution takes forever.  
This is because of the dynamic SQL statement embedded inside the innermost  
loop (the 3rd one): it must be re-parsed at each iteration, which is not  
efficient. A better implementation would consist in computing those optimum  
widths once at procedure start - requiring a full table scan. The problem then  
is to pass the array of results to the dynamic statement, probably needing a  
kludge because only simple datatypes or PL/SQL records can be passed to native  
dynamic SQL.

External solution
-----------------
Because of the way the n?awk utility parses the input file, it reads in one  
whole line of text into an internal, fixed-length buffer which can easily be  
overfilled if the table to transpose has many large columns. In these cases,  
an error message such as the following one is displayed:

nawk: input record `CUSTOMER_ID COMPANY_...' too long
input record number 2, file xx.lst
source line number 21

Indeed, the tests showed that a 32-Kb long line is not well accepted by nawk  
on Solaris. Such a limitation does not appear in GNU's awk implementation -  
gawk - because it does not have any built-in limits and the input buffer is  
dynamically extended up to a system limit. Thus, in such cases, gawk is the awk  
interpreter of choice.
The tests showed that the awk program didn't encounter any problem in  
transposing a 28'886 x 7 table, which is impossible to do using the first  
method. It took only a few minutes on a small test Unix box, so this might be  
the preferred way to go provided enough disk space is available for storing the  
original table to transpose and the result. Note also that generally the result  
file is larger because each column must be large enough to accommodate the  
largest potential value from the original row's set of mixed datatypes. In this
test case, it was more than twice as large, with mostly blanks in it. Because  
of this high data redundancy, compressing the output on the fly can drastically  
reduce the disk space requirement, as in the following command (assuming  
select.lst is the spooled result of a SELECT against the table to transpose):

nawk -f transpose.awk select.lst | compress > transposed.Z

The compressed resulting file is more than an order of magnitude smaller than  
its expanded version.
Unfortunately, assuming select.sql is an SQL script for extracting data from  
the table to transpose, the following command won't work here:

sqlplus scott/tiger @select | nawk -f transpose.awk select.lst | compress > transposed.Z

because the awk script rewinds the input data once for each table's column,  
and this is not possible when the input comes from a pipe. It must imperatively  
come from a disk file.
To fix this in an efficient and smooth manner, transpose.awk has been added a  
command line option, CompactColumns, which triggers column compaction. This  
simply consists in making the columns as narrow as possible by removing useless  
leading and trailing blanks. In such cases when those blanks are in fact  
significant, this option should not be used. In the tests, this option actually  
more than cut in half the size of the original table, with a size more than 5  
times smaller than non compacted transposed table. If on-the-fly compression is  
applied too, storage should really not be an issue any longer. See the script's  
header for usage details.
In order to make transpose.awk really robust, a great deal of effort has been  
put into it. Memory usage has been kept to a minimum by avoiding column input  
buffering. As a consequence, the script executes indeed slower but is only  
limited by the input line length, which is either a n?awk, system or both,  
limit. Thus, the external solution is definitively the better way to go if the  
table to transpose is large, or if speed is important, or if CPU competition is  
high.

Usage
-----
Here is a step by step, no frills procedure to use the awk scripts in both the  
internal and external solutions.
Internal, SQL solution
First, the primary key values for the table to transpose should be output into  
a text file; this can be done by running the following SQL*PLUS statements:

spool pk_file
select pk_index from the_table;
/
spool off

where the_table is the table to transpose and pk_index is the name of the chosen  
primary key index. A primary key file pk_file.lst is produced.
A description of the columns of the table to transpose is needed too. From  
within SQL*PLUS, use the following statement:

spool desc_file
desc the_table;
spool off

A description file desc_file.lst is produced, which can be edited if one wants  
to change the order of the rows in the transposed view. Just make sure the  
primary key column is listed first.
Now, the transposing query can be generated using gen_transpose.awk script.  
Use the following command:

nawk -f gen_transpose.awk          \
     -v PrimaryKeyFile=pk_file.lst \
     -v ViewDescFile=desc_file.lst \
     -v UserLineLength=100
     -v QueryViewFile=display_transposed.sql  > transpose.sql

QueryViewFile, if specified, will contain a SQL*PLUS query for displaying the  
transposed view.
One most systems, awk is a symbolic link (or sometimes an alias) to nawk, so  
awk can be specified instead.
Finally, the generated query, transpose.sql, can be executed from within  
SQL*PLUS.
sqlplus user/passwd @transpose.sql
The transposed view is created and can be queried as many times as needed by  
running the script in QueryViewFile:
sqlplus user/passwd @display_transposed.sql > transposed.lst
Of course, all this can be automated into a single SQL*PLUS script if it is to  
be executed many times, or simply to group all the commands into one place.

Internal, PL/SQL solution
-------------------------
The SQL*PLUS script proc_transpose.sql creates the Transpose procedure. Next,  
it asks for the following mandatory parameters: the view name, the primary key  
name, the output directory and the output file. The view name can be upper- or  
lowercase and must exist in current logged-in user's schema. The primary key  
must be the name of a column uniquely identifying the rows. The output directory  
must be one the Oracle server is granted write access on. To this effect, do not  
forget to set the UTL_FILE_DIR parameter in the instance's init.ora or the file  
creation will fail. The output file can be any valid name but be aware that a  
temporary output file named after it but with a '---' extension is created too  
and removed after it has been processed to solve quirk #2 discussed above. Thus,  
any pre-existing such file will be replaced.
External solution
A spool file of the table to transpose is needed. This can be produced by any  
SELECT statement from within SQL*PLUS, the simplest being the following one:
spool spooled_file

SELECT
   list_of_interesting_columns
FROM
   the_table
/
spool off

where the_table is the table to transpose and spooled_file a listing of it.
Now, just run the transpose.awk script against the spooled file:

nawk -f transpose.awk            \
     -v NaturalJustification=0|1 \
     -v CompactColumns=0|1 spooled_file  > transposed.lst

Set NaturalJustification to 1 if you wish to have numbers right-justified and  
strings left-justified, leave it to 0 otherwise.
Set CompactColumns to 1 if columns' leading and ending blanks should be removed.  
This is recommended because the transposed table will result denser and more  
readable. Leave it to 0 if blanks are significant.
As before, a single script could be created here that does everything from  
within one point.

Source code
-----------
Source code for the above examples, an awk implementation of the transposing  
query generator, the awk script to transpose spooled tables, and test data can  
be obtained by calling the Support hot-line and asking for the fictitious  
TAR# 137749.260. Any analyst will be able to access the files and forward  
them to the interested customer. In case any problem is found with the scripts,  
or some valuable addition is made to them, please notify me so that I can  
update or correct them. If anybody has some better solution to propose,  
I'll be glad to hear it so I can pass it on.

Conclusion
----------
There are certainly many other possible solutions to this classical problem but  
I think I have presented the most representative approaches here: an internal,  
declarative one using SQL, an internal, imperative one use using PL/SQL and an  
external, non-SQL one. A view transposition could also be dynamically performed  
by an OCI external procedure, or a standalone OCI/(O|J)DBC/Pro*whatever/SQLJ  
program with the added benefits of performance and the availability of a good  
set of I/O functions in those languages. Unluckily, the short time at my  
disposal did not permit me to play with this alternative.
Although many data manipulation operations can be expressed in declarative SQL,  
it is sometimes more efficient to resort to a procedural or even an external  
solution. The table transposition here is a blatant example. While doable in  
SQL if enough computing resources are available and the problem lays within the  
database software's limits, this puts such a large stress on the server that  
often it is not practical. The procedural approach is interesting because of  
its speed but once the data have been gathered it is faster and cheaper to use  
text-processing tools for doing what is essentially a textual transformation.



  

*************************************************************
This article is being delivered in Draft form and may contain
errors.  Please use the MetaLink "Feedback" button to advise
Oracle of any issues related to this article.
*************************************************************

PURPOSE
-------

This article's aim is to provide several solutions for transposing views, i.e.
switching the line/column axis so that lines get presented vertically and  
columns horizontally.
  
SCOPE & APPLICATION
-------------------

The proposed solutions contain ready to use SQL and PL/SQL code, along with awk
scripts. Any database programmer needing such a feature and having access to an
awk interpreter can use the solutions. One solution is based on native dynamic
SQL only, and therefore requires an 8i database server.
Techniques used are detailed enough so that they can be implemented in any
scripting language available on any particular platform.

Line-Column Transposition
-------------------------

[/b]



感觉太繁琐了,不如写个存储过程来得方便。。。
另外,也不懂
SELECT ROW_NUMBER() OVER (partition BY B ORDER BY A) RN ,
              TEST.* FROM TEST
是什么意思。不知版主可不可以详细地说说,谢谢!!!
以B列进行分组,并在组内以A列作为排序列进行排序得到的序列值
经典
太厉害了
这帖有意思
顶一下
果然是牛人,谢谢了,学到了很多东西。


好是好,但是不具有代表性,如果我的记录有很多,那么就不好处理了。


顶点!!
各位大侠:
我想转换父子型的
1          0
2          1
3          2
4          2


转换为
4         2     1    0
3         2     1     0

该怎么实现啊
阅读(2228) | 评论(0) | 转发(0) |
0

上一篇:SQL行列转换实战(二)

下一篇:SQL 行列转换

给主人留下些什么吧!~~