Cognos学习之五：Transformer优化-sdwjian-ChinaUnix博客

学习学习wjiw.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

sdwjian

博客访问： 8555134
博文数量： 444
博客积分： 10593
博客等级：上将
技术积分： 3852
用户组：普通用户
注册时间： 2006-05-09 15:26

文章分类

全部博文（444）

oracle数据库sql（2）
vs.net（10）
cognos（75）
操作系统（36）

windows（6）

aix（23）

unix（5）
sharepoint（113）
数据库（106）

sqlserver（1）

informix（25）

oracle（78）
应用软件学习（9）
其他（34）
股票分析（29）
java学习（26）
未分配的博文（4）

文章存档

2014年（1）

2013年（10）

2012年（18）

2011年（35）

2010年（125）

2009年（108）

2008年（52）

2007年（72）

2006年（23）

我的朋友

相关博文

Cognos学习之五：Transformer优化

分类：系统运维

2009-06-03 10:52:22

Cognos学习之五：Transformer优化

1. Hardware and Environment 2
1.1. Processor Considerations 2
1.1.1. Slow vs. Fast CPU Build Examples 2
1.1.2. Examples of Read Time Reduction with 2nd CPU 3
1.2. Memory Considerations 3
1.3. How Transformer uses Memory 4
1.3.1. Limited Memory Testing 4
1.4. Hard Drive Considerations 6
1.4.1. RAID 6
1.4.2. Drive Configuration 6
1.5. How Transformer uses Disk Space 7
1.5.1. How Much Disk Space? 7
1.5.2. Example of Estimated Space Calculations vs. Actual Cube Build 8
1.6. Other Applications on the Build Computer 8
1.7. Setting up the Transformer Environment 8
1.7.1. NT 8
1.8. Running Multiple Instances of Transformer 9
1.8.1. Tips 9
1.9. Preference Files 10
1.9.1. Tips 10
1.10. Database Gateway Settings 11
2. Case Studies 11
2.1. Case Study #1 12
2.2. Case Study #2 13
2.3. Case Study #3 15
2.4. Case Study #4 17
2.5. Case Study #5 19
2.6. Case Study #6 21

Purpose
Demands for larger and more complex PowerCubes are becoming a common occurrence as businesses grow and expand. As this occurs, optimizing build times and runtime performance becomes extremely important.The purpose of this document is to provide guidance and ‘best practice’ methodology to aid in performance related strategies concerning PowerCube build and runtime performance.This document spans several versions of Transformer up to and including Series 7 Version 2. We advise you to confirm specific version capabilities before embarking on a project.
1. Hardware and Environment
Hardware and environment settings can have a huge impact on performance during the cube build process and can also be the root cause of production related issues. This section is essentially a ‘best practice’ guide that focuses on selecting and enhancing a Transformer build computer through the use of hardware and environment settings.
1.1. Processor Considerations
Choosing the fastest available processor speed should be considered. The addition of a second CPU can result in a significant reduction in the data read phase when using Transformer’s multi-processing feature. The data source type utilized in a model will impact the total reduction time when adding a second CPU. Using ASCII data sources will provide the greatest reduction in read time followed by reading RDBMS sources. It is important to note that even though the fastest CPU should be selected, Transformer is not primarily a CPU bound application. If a bottleneck occurs during a PowerCube build it usually involves either the system memory or hard drive.

1.1.1. Slow vs. Fast CPU Build Examples
Using different test models and data sets, a series of cube builds were performed on Windows NT computers with various processor speeds (slower and faster). Keeping in mind that other hardware components do contribute to the total build time, the results clearly indicate that a faster CPU speed is better.
NT – Dual P200 vs. Dual Xeon 500

1.1.2. Examples of Read Time Reduction with 2nd CPU
The following test was done to illustrate the read time reduction that is obtained when a second CPU is available on the Transformer build computer. The ‘Category and Row Dominant’ model (ASCII and RDBMS versions) was used to demonstrate the difference in build time on NT.
Note: The multi-processing feature available in Transformer must be enabled on each data source to take advantage of the second CPU.

1.2. Memory Considerations
Memory is probably the most important choice made during hardware selection, followed closely by disk configuration. The amount of memory selected can be dictated by the number of categories in the Transformer model and the resulting size of the largest PowerCube (assuming that the server is dedicated to Transformer). Optimally there should be enough memory on the build computer to handle all running application requests for memory and allow the operating system disk cache to grow as required during the PowerCube build. Excessive paging will take place in situations where there is not enough physical memory available for Transformer, which will result in a significant increase during PowerCube build time.
1.3. How Transformer uses Memory
As stated above, Transformer’s memory consumption is directly related to the amount of categories in the model and the associated Transformer memory settings as selected by the Administrator. The following is a chart that tracks Transformer’s use of memory while processing the ‘Category and Row Dominant’ test model:

The top line in the graph represents total ‘Virtual bytes’ used by Transformer while the lower one represents the ‘Working Set’. The ‘Virtual Bytes’ used by an application is the total amount of addressable memory the application has requested while the ‘Working Set’ represents the amount of physical memory that is actually being used by the application. The amount of memory represented by ‘Working Set’ comes out of the available physical memory on the computer. Memory use climbs rapidly when categories are being generated during the Data Read phase as the data source is processed. The more categories, the more memory required. Memory use per category is not completely predictable because each model is different but observations of memory use for various models have shown that most fall in a range of 500 to 1,500 bytes per category (Working Set). Systems will have to resort to paging (swap file use) to continue processing when the amount of physical memory is limited and the ‘Working Set’ cannot grow to the amount required for the model. When this occurs the performance hit on PowerCube build is significant. For more information, please refer to the limited memory test chart in the following section. Memory use continues to be high through the Metadata Update stage but drops off significantly when data is being written to the PowerCube. At this stage, available memory will be freed up and can be used by the operating system disk cache as required when the actual PowerCube is being built.
1.3.1. Limited Memory Testing
Using the ’Category and Row Dominant’ test model, a series of tests were run on the same NT computer (COMPAQ AP350) with different amounts of available system RAM to see what effect this would have on build time. First, the model was run with all available system memory (512MB) and the results recorded. The second test involved setting the amount of system RAM well below the working set recorded for the full memory test (128MB).
The following chart displays the timing results:

This particular test model has a ‘Working Set’ requirement of approximately 200MB. The chart shows that cube build time degrades considerably if the available physical memory on the computer is below Transformer’s ‘Working Set’ requirement. Another way to look at this is by looking at Page File activity during the two test runs. The first chart looks at ‘Working Set’ memory compared to the percentage of Page File use on the system for the test run with 512MB of available memory.

Note the difference of the Page File graph lines on the two charts. When you compare the two charts it is immediately evident that the Working Set is much smaller for the test run with only 128MB of RAM available. The smaller Working Set causes a significant increase in Page File use which has a negative effect on the time it takes to build the PowerCube.
1.4. Hard Drive Considerations
This section provides some information to optimize your environment in relation to Transformer.
1.4.1. RAID
When larger PowerCubes are being built, disk space requirements can be quite high. The type of drives and amount of disk space available will have a very big impact on the PowerCube build process. The ideal configuration would consist of a drive subsystem that has multiple disk controllers with the fastest possible disk drives configured for RAID 0 or RAID 1:

RAID level 0 provides the fastest performance. In the event of a disk failure during a PowerCube build, the cube can be rebuilt from the original source data.

1.4.2. Drive Configuration
Transformer is an I/O bound application. The type, speed and configuration of the drive subsystem can cause a significant increase in the time it takes to build a PowerCube. Choosing a drive configuration for Transformer is very similar to the way it is done for relational database servers. Ideally, the drive subsystem should have more than one physical disk (three or more is optimum). The typical database installation sees applications and operating system on one disk, data on another and indexes on the third.
With Transformer the breakdown would be as follows:
• 1st Controller: Operating System and applications
• 2nd Controller: Transformer Data Work directory
• 3rd Controller: Sort directory and PowerCube directory
Lets assume that the server consists of the following configuration regarding controllers:
• 1st Controller is drive C
• 2nd Controller is drive D
• 3rd Controller is drive E
According to the Transformer specifications on drive configurations, the following would
apply:
• Drive C would contain the operating system and Transformer application
• Drive D would contain the location for the DataWorkDirectory
• Drive E would contain the locations for the ModelWorkDirectory and the
CubeSaveDirectory.
The log file below illustrates the above settings:
PowerPlay Transformer Wed Sep 19 09:39:17 2001
LogFileDirectory=c:\transformer\logs
ModelSaveDirectory=c:\transformer\models\
DataSourceDirectory=c:\transformer\data\
CubeSaveDirectory=e:\transformer\cubes\
DataWorkDirectory=d:\temp\
ModelWorkDirectory=e:\temp\
1.5. How Transformer uses Disk Space
During a cube build Transformer uses disk space in the following fashion:
• Data Read phase: During this phase Transformer is reading the source data and creating a temporary work file based on the structure of the Transformer model.
• Metadata Update phase: After the source data is read, the temporary work file is processed to determine the status of categories in the cube. A copy of the temporary work file is created and gets processed. After processing is complete, the original work file is deleted and all valid categories are put into the PowerCube.
• Data Update phase: After the categories are added to the PowerCube, the data in the temporary work file is inserted into the cube. If the PowerCube is partitioned, the temporary work file is sorted and then inserted into the PowerCube. Depending on the PowerCube settings a number of passes through the temporary work file may be required.
1.5.1. How Much Disk Space?
It is possible to calculate the amount of disk space that Transformer will require for the temporary work files used while building the PowerCube. The one thing that cannot be predicted in advance is the final size of the PowerCube. This is due to the amount of variables that contribute to the PowerCube size which are unique to each environment, data set and model configuration.
The amount of space used in temporary files can be calculated as long as the Transformer model being used has been fully defined and the number of input records is known.
The following spreadsheet formula can be used to estimate temporary work file disk space requirements:

This spreadsheet assumes the following:
• Auto-partitioning has been used
• Calculated measures are not counted
• Only count dimension views that are actually attached to the PowerCube
• Can be used to formulate single or PowerCube groups
The spreadsheet formula will provide a good estimate of the disk space required for temporary work files but does not account for the PowerCube and model checkpoint files. While there is no reliable method to accurately predict PowerCube size, a good rule of thumb would be to add 20% of the estimated disk space required for temporary files. The size of the Transformer checkpoint file will be roughly equivalent to the ‘Working Set’ for the model. For more information, please refer to section 1.3. To calculate the size of a model work file, double click on the attached spreadsheet above. To determine the WorkFileMaxSize to enter in the spreadsheet, divide the existing number (found in the trnsfrmr.ini file) by 1024 for KB and then 1024 for MB. For example, if the default WorkFileMaxSize setting is used it would be calculated as follows:
(2000000000/1024)/1024 = 1907
1.5.2. Example of Estimated Space Calculations vs. Actual Cube Build
Using the spreadsheet formula the estimated disk space required for the ’Category and
Row Dominant’ test model worked out as follows:

The above spread shows a 7GB work file is created during the PowerCube build. A test system was then set up with the Transformer Data Temporary file and Sort directory all pointing to the same directory location. All other Transformer directory locations were pointed to another directory (on another disk drive) and the Windows NT performance monitor was used during the cube build to track the amount of disk space available.
1.6. Other Applications on the Build Computer
Since Transformer can be considered a memory- and I/O-bound application it is not desirable to have other applications running on the PowerCube build computer that place a demand on the system in these areas. We recommend that Transformer be located on a server dedicated solely to PowerCube builds, or that no other applications are active during the cube builds.
1.7. Setting up the Transformer Environment
1.7.1. NT
This section lists the settings specific to Transformer on Windows NT that should be considered for optimum performance.
• WriteCacheSize: The value for the write cache can affect PowerCube build time in a positive or negative way depending on how much memory is available. The best performance is achieved when enough physical memory is available so that the disk cache can grow to be as large as the final size of the PowerCube. You can change this setting in the Configuration Manager under Services - PowerPlay Data Services - Cache. The default value is set to 8192 (or 8MB). To change this, increase the number by increments of 1024. Increasing the write cache to 32768 (32MB) or 65536 (64MB) on a large system can provide performance improvements. However, increasing it to a very large number (i.e. 102400 or hundreds of megabytes) can degrade performance.
• SortMemory: This variable sets the amount of physical memory that is available when the data is sorted. Transformer sorts data for consolidation and auto-partitioning. The number you specify represents the number of 2K blocks used when sorting data. For example, setting a value of 5120 provides 5120 x 2K = 10MB of memory. The default value is set to 512. You can change the default in the Configuration Manager under Services - UDA - General. A good place to start is by changing the default value to equal 5120.
• TEMPFILEDIRS: Transformer uses this setting for the temporary sort file. This file is created whenever Transformer has to perform a sort operation. You can change the location in the Configuration Manager under Services - UDA - General. You can specify multiple directories separated by semicolons.
• MaxTransactionNum: Transformer inserts checkpoints at various stages when generating PowerCubes. The Maximum Transactions Per Commit setting limits the number of records held in a temporary status before inserting a checkpoint. The default setting is MaxTransactionNum=500000. The value specified is the maximum number of records that Transformer is to process before committing the changes to a PowerCube. The default can be changed in the Transformer Preferences dialog box under the General tab. If errors occur during a cube build (i.e. TR0112 There isn't enough memory available) lower the MaxTransactionNum so that it commits more frequently and frees up drive space. This setting can be increased to a higher number (such as 800000) to improve the cube build time but the results will vary dependant on the environment.
Note: The ReadCacheSize setting is not relevant to Transformer. This setting is specific
to PowerPlay Enterprise Server and PowerPlay Client only.

1.8. Running Multiple Instances of Transformer
If the server is a multi CPU system, multiple instances of Transformer can be run in parallel. This is especially useful when a large number of cubes must be built in parallel to meet a PowerCube build production window.
When running multiple Transformer instances, the following is recommended:
• Each Transformer process should have its own dedicated CPU. If Multi-Processing is enabled, then each instance of Transformer should have 2 dedicated CPUs.
• Each Transformer instance will use system resources independent of all other instances. Ensure that you have sufficient memory, disk space, and I/O bandwidth to support all instances.
• Each Transformer instance will require its own set of configuration files. It is recommended that the DataWorkDirectory and ModelWorkDirectory locations are not shared between Transformer instances. For more information on how to set up the
configuration files, please refer to section 1.9.
1.8.1. Tips
• Using the UNIX nohup command will allow you to continue executing the
command even though you have logged out of the session. Example:
nohup rsserver -mmodel.mdl
• Adding an ampersand (&

to the end of the UNIX command line will allow you to
start the first process in the background giving you back control of the prompt
which will allow you to initiate the second RSSERVER command.
rsserver -mmodel.mdl &
1.9. Preference Files
When Transformer begins a PowerCube build, the model is populated with categories,
cubes are generated and a log file is created. How and where these actions are
performed is determined by a number of preferences and environment settings that you
can specify in preference files.
Several preference file settings are available for use but the most commonly used ones
are listed below:
• ModelWorkDirectory=
Specifies where Transformer creates temporary files while you work on your model. The temporary file can be used to recover a suspended model at strategic checkpoints should a severe error occur during cube creation. This file has the extension QYI. The default path is the value of the ModelSaveDirectory setting.
• DataWorkDirectory=
Specifies where Transformer creates temporary work files while generating cubes. Being able to use multiple drives eliminates size limitations set by the operating system. As Transformer creates cubes it writes temporary files to the specified drives or directories. The files are then concatenated into one logical file, regardless of which drive they are in. The location of these files is determined by the list of paths that you specify. The default path is the value of the CubeSaveDirectory setting.
• DataSourceDirectory=
For data source files other than IQD files and Architect models, this setting specifies
where Transformer searches for the files. The default path is the current working
directory.
• CubeSaveDirectory=
Specifies where Transformer saves cubes. The default path is ModelSaveDirectory.
• ModelSaveDirectory=
Specifies where Transformer saves models. The default path is the current working
directory.
Here is an example of these settings in a Transformer log file:
PowerPlay Transformer Wed Sep 19 09:39:17 2001
LogFileDirectory=c:\transformer\logs
ModelSaveDirectory=c:\transformer\models\
DataSourceDirectory=c:\transformer\data\
CubeSaveDirectory=e:\transformer\cubes\
DataWorkDirectory=d:\temp\
ModelWorkDirectory=e:\temp\
The examples below display how to specify the use of a preference file on the command line:
Windows:
trnsfrmr -n -fc:\preferences.prf model.mdl
UNIX:
rsserver -F preferences.rc –mmodel.mdl
1.9.1. Tips
• Specifying the use of a preference file on the command line will override and take precedence over all other settings. For example, if you have environment settings defined in the rsserver.sh file, using a preference file on the command line will override these settings.
• The environment variables TMPDIR, TEMP, and TMP can also determine where Transformer creates temporary files. Transformer uses the first environment variable that is defined. These environment variables are system environment variables defined by the operating system.
1.10. Database Gateway Settings
A number of gateway INI files are included with a Transformer install that include database specific settings that can help reduce the read phase during a cube build. All files are named COGDM*.INI with the asterisk representing a specific database version of this file. For example, the Oracle specific INI file is named COGDMOR.INI and is located in the \cer2 directory. This file contains the following settings:
• Fetch Number of Rows: This setting is used to determine how many rows to fetch per fetch operation. Increasing this number can provide better performance on some systems. Note that this number is currently limited to 32767. Also note that numbers larger than 100 may actually degrade performance on some systems: Fetch Number of Rows=100
• Fetch Buffer Size: This setting is used to determine the size of buffer to use when fetching. Larger values can provide better performance on some systems. By default, the buffer size used is 2048 bytes, to change this default, edit the following entry and set it accordingly:
Fetch Buffer Size=2048
Note: If Fetch Buffer Size and Fetch Number of Rows are both set, Fetch Number of
Rows will take precedence.

2. Case Studies
The following case studies are included to provide the reader with some insight into the
dramatic differences in PowerCube build times when various factors are taken into
consideration. These case studies consist of actual client test cases.
Note: All case studies were performed in an isolated lab where external influences were
not a concern. No other applications were active during the cube builds in the BEFORE
or AFTER tests and case studies.

The following displays the keywords that relate to each of the phases of a
PowerCube build:
Data Read
• INITIALIZING CATEGORIES
• OPEN DATA SOURCE
• READ DATA SOURCE
• MARKING CATEGORIES USED
Metadata Update
• SORTING
• UPDATE CATEGORY AND PROCESS WORK FILE
• METADATA
Data Update
• CUBE UPDATE
• CUBE COMMIT

It is important that the user have a good understanding of the three distinct phases Transformer goes through to build a PowerCube. It is also important to determine how long each of these phases takes for a particular PowerCube build if the issue is related to timing.
The three phases of a cube build are:
• Data Read: During this phase the input records are read from the selected data
source into temporary work files. Common issues during this phase include
database connectivity and insufficient disk space.
• Metadata Update: During this phase the contents of the temporary work files are
compared to the categories in the Transformer model to determine which
categories will be put in the PowerCube. When the list of eligible categories is
complete the categories are inserted into the PowerCube. Common issues
during this phase include lack of memory and insufficient disk space.
• Data Update: During this phase the actual data values in the temporary work files
are inserted into the PowerCube. Each record inserted into the cube is a ‘data
point’ that consists of a category reference from each dimension in the model
along with the measure values for the intersection of those categories. A
common issue during this phase is low system memory.

2.1. Case Study #1
Based on a number of factors including number of transactional records, it was apparent
that this PowerCube build was taking a long time to complete, which warranted an
investigation.
Description of Model:

Original Transformer Log File (BEFORE):

Diagnosis:
During an analysis of the log file the following warning was discovered:
Warning: (TR2757) This model contains one or more cubes that use a
dimension view in which the primary drilldown is cloaked. Auto-partitioning is not
possible when a primary drilldown is cloaked.
As mentioned previously in this document, disabling auto-partitioning can have a
significant impact on build time. Please refer to section 5.5 for more details. After
changing the primary drill category we resolved the above warning.
Updated Transformer Log File (AFTER):

Conclusion:
By making one small change in the model, the build time decreased dramatically from 7
hours and 41 minutes to 41 minutes.
2.2. Case Study #2
As the PowerCube became increasingly longer to build and larger in size, it was
necessary to optimize performance by changing hardware so the cube could be built
within the production window.
Description of Model:

Original Transformer Log File (BEFORE):

Diagnosis:
Purchased a new server dedicated to Transformer cube builds. This case study
dramatically proves how hardware can affect the total cube build time.
Original Server Specs:

New Server Specs:

Updated Transformer Log File (AFTER):
Phase Time

Conclusion:
The hardware being utilized to build PowerCubes can have a dramatic effect as this
example demonstrates.
2.3. Case Study #3
In order to build a large number of cubes within a specified time frame, it became
necessary to have multiple instances of Transformer running building PowerCubes.
Description of Models:
Model A:

Model B:

Model C:

Model D:

Individual Build Times (BEFORE):
Model PowerCube Build Time

Concurrent Build Times (AFTER):
Model PowerCube Build Time

Conclusion:
Having a server with 8 CPUs allows you the flexibility of running four PowerCube builds
at the same time (with Multi-Processing enabled in each model). Building the
PowerCubes concurrently saved 2 hours and 23 minutes off of the total build times in
comparison to building the PowerCubes individually.
2.4. Case Study #4
As the PowerCube became increasingly longer to build and larger in size, it was
necessary to optimize performance by changing hardware so the cube could be built
within the production window.
Description of Model:

Original Transformer Log File (BEFORE):

Diagnosis:
Purchased a new server dedicated to Transformer cube builds. This case study
dramatically proves how hardware can affect the total cube build time.
Original Server Specs:

New Server Specs:

Updated Transformer Log File (AFTER):
Phase Time

Conclusion:
The hardware being utilized to build PowerCubes can have a dramatic effect as this
example demonstrates.
2.5. Case Study #5
This case study is meant as a way to show the exponential increase in various facets
including build time, cube size, etc.
Server Specs:

Description of Models:
Model A:

Model A Transformer Log File:

Model B:
  Model B Transformer Log File:

Conclusion:
Comparing the results of these two builds demonstrates the increase in build time and
cube size as the number of source records and categories increase.
2.6. Case Study #6
This case study represents an actual test as performed during beta testing for an
existing Cognos PowerPlay customer. We compared the build time of an
Incrementally Updated PowerCube to a Time-Based Partitioned Cube.
Description of Model:
  Incremental Update Log File (BEFORE):

Diagnosis:
When an incremental update is performed several Data Updates occur because autopartitioning
is no longer happening. This results in a slower cube build.
The Time-Based Partitioned Cube feature not only takes advantage of auto-partitioning
but the cube builds are much faster as the Data Update phase is not used.
Time-Based Partitioned Cube Log File (AFTER):

Conclusion:
By modifying the model to take advantage of Time Based Partitioned Cubes, the build
time decreased dramatically from 12 hours to 14 minutes.
NOTE: Although the number of data source records differ between the Incremental
Update and the Time-Based Partitioned Cube builds, we believe the results can still be
meaningfully compared.

阅读(4802) | 评论(0) | 转发(0) |

上一篇：Cognos学习之四：Cube性能优化、参数配置和更新

下一篇：Cognos学习之六：前端与服务器参数传递

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6