从事IT基础架构多年,发现自己原来更合适去当老师……喜欢关注新鲜事物,不仅限于IT领域。
分类: Oracle
2007-01-15 13:45:41
找来一篇oracle官网关于数据库数据压缩的文章,从实际使用看数据压缩是确实可行的,效果也不错。当然还要看你自己的应用是不是适合用了。
Use table compression to save space and improve query performance.
Most decision support systems usually involve large amounts of data stored in a few very large tables. As these systems evolve, the demand on disk space can grow quickly. In today's environment, data warehouses of hundreds of terabytes have become increasingly common.
To help manage disk capacity issues, the table compression feature introduced in Oracle9i Release 2 can significantly reduce the amount of disk space used by database tables and improve query performance in some cases.
In this article, I show you how table compression works, as well as how to configure table compression when building and managing databases. I also discuss performance considerations, based on some sample test results, to help you understand how much benefit you should expect to receive when using table compression.
How It Works
The Oracle9i Release 2 table compression feature works by eliminating duplicate data values found in database tables. Compression works at the database block level. When a table is defined as compressed, the database reserves space in each database block to store single copies of data that appear in multiple places within that block. This reserved space is called the symbol table. Data tagged for compression is stored only in the symbol table and not in the database rows themselves. As data tagged for compression appears in a database row, the row stores a pointer to the relevant data in the symbol table, instead of the data itself. The space savings come from eliminating redundant copies of data values in the table.
The effects of table compression are transparent to a user or an application developer. Developers access a table the same way regardless of whether a table is compressed or not, so SQL queries don't have to change once you decide to compress a table. Table compression settings are usually configured and managed by database administrators or architects, with little involvement from developers or users.
How to Create a Compressed Table
To create a compressed table, use the COMPRESS keyword in the CREATE TABLE statement. The COMPRESS keyword directs Oracle Database to store rows in the table in compressed format wherever possible. The following is an example of the CREATE TABLE COMPRESS statement:
CREATE TABLE SALES_HISTORY_COMP ( PART_ID VARCHAR2(50) NOT NULL, STORE_ID VARCHAR2(50) NOT NULL, SALE_DATE DATE NOT NULL, QUANTITY NUMBER(10,2) NOT NULL ) COMPRESS ;
Alternatively, you can use the ALTER TABLE statement to change the compression attribute of an existing table, as in the following:
ALTER TABLE SALES_HISTORY_COMP COMPRESS;
To determine whether a table has been defined using COMPRESS, query the USER_TABLES data dictionary view and look at the COMPRESSION column, as in the example below:
SELECT TABLE_NAME, COMPRESSION FROM USER_TABLES; TABLE_NAME COMPRESSION ------------------ ----------- SALES_HISTORY DISABLED SALES_HISTORY_COMP ENABLED
The COMPRESS attribute can also be defined at the tablespace level, either at the time of creation (by using CREATE TABLESPACE) or later (by using ALTER TABLESPACE). The COMPRESS attribute has inheritance properties similar to those of storage parameters. When a table is created in a tablespace, it inherits the COMPRESS attribute from the tablespace. To determine whether a tablespace is defined using COMPRESS, query the DBA_ TABLESPACES data dictionary view and look at the DEF_TAB_COMPRESSION column, as in the following example:
SELECT TABLESPACE_NAME, DEF_TAB_COMPRESSION FROM DBA_TABLESPACES; TABLESPACE_NAME DEF_TAB_COMPRESSION --------------- ------------------- DATA_TS_01 DISABLED INDEX_TS_01 DISABLED
As you might expect, you can still explicitly compress or uncompress a table in a tablespace, regardless of the COMPRESS value at the tablespace level.
Loading Data into a Compressed Table
Note that when you specify COMPRESS as shown above, you aren't actually compressing any data. The commands above only modify a data dictionary setting. Data isn't actually compressed until you load or insert data into a table.
Furthermore, to ensure that data is actually compressed, you need to use a proper method to load or insert data into the table. Data compression takes place only during a bulk load or bulk insert process, using one of the following four methods:
The direct path SQL*Loader method is the most convenient way to load input data into a table if it is available in a flat file. An example is shown below:
$sqlldr sanjay/sanjay@proddb control=sales_history.ctl direct=true
If the input data is available in a staging table, you can use a serial INSERT with an APPEND hint or a parallel INSERT.
As an example, consider input data that is available in an uncompressed staging table named SALES_HISTORY. Using the serial INSERT method, you can use the following statement to insert the data into the compressed table:
INSERT /*+ APPEND */ INTO SALES_HISTORY_COMP SELECT * FROM SALES_HISTORY;
Alternatively, you can use a parallel INSERT to transfer data from a staging table into a compressed table, as shown below:
ALTER SESSION ENABLE PARALLEL DML; INSERT /*+PARALLEL(SALES_HISTORY_COMP,4)*/ INTO SALES_HISTORY_COMP SELECT * FROM SALES_HISTORY;
Note that, when using the parallel INSERT method, you need to first enable parallel DML for your session by using the command ALTER SESSION ENABLE PARALLEL DML.
If the input data is in a flat file, you can also use an external table, and then insert the data into a compressed table as if the data were available in a staging table. (A discussion of external tables is beyond the scope of this article.)
You can also use the CREATE TABLE ... AS SELECT statement to create a compressed table and insert data into it in one go. Here is an example:
CREATE TABLE SALES_HISTORY_COMP COMPRESS AS SELECT * FROM SALES_HISTORY;
If you don't use the correct loading or INSERT method, the data in the table will remain uncompressed, even if the table is defined using COMPRESS. For example, if you use conventional path SQL*Loader or regular INSERT statements, data will not be compressed.
When to Use Table Compression
The way that Oracle Database chooses to compress or not compress table data has implications for the kind of applications best suited for table compression. As described above, data in a table defined using COMPRESS gets compressed only if it is loaded using direct path mode or inserted using append or parallel mode. Data inserted through regular insert statements will remain uncompressed.
In online transaction processing (OLTP) systems, data is usually inserted using regular inserts. As a result, these tables generally do not get much benefit from using table compression. Table compression works best on read-only tables that are loaded once but read many times. Tables used in data warehousing applications, for example, are great candidates for table compression.
Furthermore, updating data in a compressed table may require rows to be uncompressed, which defeats the purpose of compression. As a result, tables that require frequent update operations are not suitable candidates for table compression.
Finally, consider the effects of row deletion on the use of table compression. When you delete a row in a compressed table, the database frees up the space occupied by the row in the database block. This free space can be reused by any future insert. However, since a row inserted in conventional mode isn't compressed, it is unlikely that it would fit in the space freed up by a compressed row. High volumes of successive DELETE and INSERT statements may cause fragmentation and waste even more space than would be saved using compression.
Compressing an Existing Uncompressed Table
If you have an existing uncompressed table, you can compress it by using the ALTER TABLE ... MOVE statement. For example, an uncompressed table named SALES_HISTORY_TEMP could be compressed by using the following:
ALTER TABLE SALES_HISTORY_TEMP MOVE COMPRESS;
You can also use the ALTER TABLE ... MOVE statement to uncompress a table, as in the following example:
ALTER TABLE SALES_HISTORY_TEMP MOVE NOCOMPRESS;
Note that the ALTER TABLE ... MOVE operation acquires an EXCLUSIVE lock on the table, which prevents any DML operation on the table while the statement executes. You can avoid this potential problem by using the online table redefinition feature of Oracle9i Database.
Compressing a Materialized View
You can compress materialized views in a fashion similar to the one you use to compress tables. The following command creates a compressed materialized view:
CREATE MATERIALIZED VIEW MV_SALES_COMP COMPRESS AS SELECT P.PART_NAME, H.STORE_ID, H.SALE_DATE, H.QUANTITY FROM SALES_HISTORY H, PARTS P WHERE P.PART_ID = H.PART_ID;
Materialized views based on joins of multiple tables are usually good candidates for compression, since they often have a number of repeated data items. You can change the compression attribute of a materialized view by using the ALTER MATERIALIZED VIEW command. The following command shows how to compress an existing uncompressed materialized view:
ALTER MATERIALIZED VIEW MV_SALES COMPRESS;
When you're using this command, note that the compression will actually take place the next time the materialized view is refreshed.
Compressing a Partitioned Table
You have a lot of choices when using compression across partitioned tables. You can apply compression either at the table level or at the partition level. For example, the CREATE TABLE statement in creates a table with four partitions. Since COMPRESS is specified at the table level, all four partitions are compressed.
Since compression can be defined at the partition level, you have the option to compress some partitions and keep some partitions uncompressed. The example in shows how to specify compression at the partition level.
In , two table partitions (SALES_Q1_03 and SALES_Q2_03) are compressed, while the other two are uncompressed. Note that compression attributes specified at the partition level override the table-level definition for that partition. If the compression attribute is not specified for a partition, then the partition inherits the value from the table-level definition. In , since compression attributes for partitions SALES_Q3_03 and SALES_Q4_03 are not specified, these two partitions inherit the value from table definition (which in this case is the default NOCOMPRESS).
Partitioned tables provide a unique benefit when used with compression. One useful way to partition tables is to put data subject to DML operations (inserts, updates, and deletes) into separate partitions from read-only data. For example, in the table definition, sales data is partitioned based on SALE_DATE, such that sales history for each quarter is stored in a separate partition. In this example, sales data for Q1 and Q2 2003 can't be modified, so it is put into compressed partitions SALES_Q1_03 and SALES_Q2_03. Sales data for Q3 and Q4 can still be modified, so the corresponding partitions SALES_Q3_03 and SALES_Q4_03 are left uncompressed.
If, at the end of Q3 2003, data in the SALES_Q3_03 partition becomes read-only, you can compress this partition by using the ALTER TABLE ... MOVE PARTITION command, as shown in the following statement:
ALTER TABLE SALES_PART_COMP MOVE PARTITION SALES_Q3_03 COMPRESS;
To find out which partitions of a table are compressed, you can query the data dictionary view USER_TAB_PARTITIONS, as in the following example:
SELECT TABLE_NAME, PARTITION_NAME, COMPRESSION FROM USER_TAB_PARTITIONS; TABLE_NAME PARTITION_NAME COMPRESSION ---------------------------- ----------- SALES_PART_COMP SALES_Q4_03 DISABLED SALES_PART_COMP SALES_Q1_03 ENABLED SALES_PART_COMP SALES_Q2_03 ENABLED SALES_PART_COMP SALES_Q3_03 ENABLED
Quantifying the Benefits
READ more about table compression
|
The biggest reason to use table compression is to save storage space. A table in compressed form will usually occupy less space when compared to its uncompressed form. To illustrate this point, consider the following test, with two tables—one uncompressed (SALES_HISTORY) and one compressed (SALES_HISTORY_COMP). These two tables were both loaded using direct path SQL*Loader with a single flat file containing two million rows. After both data loads were performed, the compressed table occupied almost half the space of the uncompressed one. shows the analysis.
The fact that a compressed table can be stored in fewer blocks results in storage savings, but fewer blocks can mean performance improvements as well. Queries on a compressed table in an I/O bound environment will often complete more quickly, because they need to read fewer database blocks. To illustrate this point, I executed a query on the compressed and the uncompressed tables and performed a SQLTRACE/TKPROF analysis. shows the results.
The SQLTRACE/TKPROF report demonstrates that my query on the compressed table performed fewer physical and logical I/O operations than the corresponding query on the uncompressed table and consequently runs faster.
The Performance Overhead
Since table compression takes place at the time of a bulk load, data-loading operations require extra processing because of the additional overhead involved. To measure the impact of compression on performance, I performed a test in which I loaded one million rows (using direct path SQL*Loader) into two identical tables: one compressed and the other uncompressed. shows results taken from the SQL*Loader log files that indicate how much time was required to load data into both the compressed and the uncompressed tables.
Table Name | Number of Rows | Path | Compressed? | Elapsed Time |
SALES_HISTORY | 1000000 | Direct | Uncompressed | 00:00:21.12 |
SALES_HISTORY_COMP | 1000000 | Direct | Compressed | 00:00:47.77 |
The extra time needed to load the compressed table stems from the compression operations performed during the data load. In a real-world scenario, the actual time difference would depend on the table design and data layout of the given environment.
Conclusion
The table compression feature in Oracle9i Release 2 can save significant amounts of disk space, especially on databases with large read-only tables. If you keep in mind the loading and inserting requirements and identify tables that are good candidates for compression, you should find table compression to be an incredible way to save disk space and, in some cases, improve query performance.