早上一上班就收到事业部同事的邮件。邮件中提到最近的计划数据收集由原来的0.5小时增加到现在的2小时ASCP的调度计划受到了影响,需要马上解决这个问题。我把最近的计划数据收集请求(实质上是一个请求集,里面包含两个请求:计划数据提取和阶段装入计划 ODS)的日志看了一下,发现从5月1号开始,计划数据收集的时间不断增长,其中计划数据提取这个请求耗时最长。通过metalink的资料得知,当计划数据提取这个请求提交的时候会激活另外一个叫刷新集合快照的请求,等这个请求完成后,数据提取的动作才开始。对比了一下日志发现确实如此。再查看一下刷新集合快照请求完成的时间,很耗时。对比几个请求的时间可以得出这样的结论:计划数据收集请求集依赖于刷新集合快照这个请求的完成,也就是说问题不是计划数据提取慢而是刷新集合快照出现了性能问题。实际上计划数据提取的过程仅仅几分钟而已。
问题已经定位。那么是什么导致刷新集合快照这个请求出现了性能问题呢?首先有必要对这个请求进行trace。在EBS中找到这个请求的定义,勾上跟踪选项,然后单独提交这个请求,请求跑了一个小时(在测试环境)。tkprof trace文件来看看输出的结果,这里节选了一些关键的输出:
Rows Row Source Operation
------- ---------------------------------------------------
19508 HASH JOIN
19508 VIEW
19508 SORT UNIQUE
70916 TABLE ACCESS FULL MLOG$_BOM_COMPONENTS_B
154376 TABLE ACCESS FULL BOM_COMPONENTS_B
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT GOAL: CHOOSE
19508 HASH JOIN
19508 VIEW
19508 SORT (UNIQUE)
70916 TABLE ACCESS GOAL: ANALYZED (FULL) OF
'MLOG$_BOM_COMPONENTS_B'
154376 TABLE ACCESS GOAL: ANALYZED (FULL) OF 'BOM_COMPONENTS_B'
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
db file sequential read 142 0.05 0.78
db file scattered read 122054 0.60 580.18
latch free 5 0.00 0.01
********************************************************************************
SELECT DISTINCT M_ROW$$
FROM
(SELECT M_ROW$$ FROM "BOM"."MLOG$_BOM_COMPONENTS_B" MLOG$ WHERE "SNAPTIME$$"
> :1 AND ("DMLTYPE$$" != 'I')) LOG$ WHERE (M_ROW$$) NOT IN (SELECT ROWID
FROM "BOM"."BOM_COMPONENTS_B" "MAS_TAB$" WHERE ("MAS_TAB$"."OBJ_NAME" IS
NULL) AND MAS_TAB$.ROWID = LOG$.M_ROW$$)
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.02 0 0 0 0
Fetch 1 26.44 458.04 966569 1005840 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 26.44 458.07 966569 1005840 0 0
...省略
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 3808 1.06 17.63 264 3461 3 0
Execute 210342 96.03 1214.96 2543420 2880409 1032826 407498
Fetch 126796 98.94 1896.00 2557081 3730664 448058 209002
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 340946 196.03 3128.60 5100765 6614534 1480887 616500
Misses in library cache during parse: 389
Misses in library cache during execute: 25
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
SQL*Net message to client 11 0.00 0.00
SQL*Net message from client 11 0.01 0.02
db file sequential read 18975 0.76 122.26
process startup 12 0.02 0.22
PX Deq: Join ACK 51 0.01 0.03
PX Deq Credit: send blkd 80 1.58 1.73
PX Deq: Parse Reply 47 0.04 0.14
PX Deq: Execute Reply 878 0.09 3.15
PX Deq: Signal ACK 93 0.09 2.07
db file scattered read 637890 1.39 2716.48
enqueue 30 0.91 0.94
rdbms ipc reply 24 0.00 0.01
buffer busy waits 171 0.03 0.13
undo segment extension 4110 0.00 0.00
latch free 13 0.00 0.01
log file switch completion 2 0.12 0.17
log file sync 3 0.04 0.05
PX Deq Credit: need buffer 51 0.00 0.00
PX Deq: Table Q Normal 1 0.00 0.00
327 user SQL statements in session.
3513 internal SQL statements in session.
3840 SQL statements in session.
85 statements EXPLAINed in this session.
********************************************************************************
Trace file: dev_ora_741426_SJH_CR1436888.trc
Trace file compatibility: 9.02.00
Sort options: fchela
0 session in tracefile.
327 user SQL statements in trace file.
3513 internal SQL statements in trace file.
3840 SQL statements in trace file.
412 unique SQL statements in trace file.
85 SQL statements EXPLAINed using schema:
SYSTEM.prof$plan_table
Default table was used.
Table was created.
Table was dropped.
1030844 lines in trace file.
--如果你是ORACLE DBA这里的数据已经说明了问题,我就不啰嗦了。MLOG$_* 导致了性能问题。
在metalink上找了一些资料,对这个问题有了一个清晰的思路。继续在测试机上测试。单独提交刷新集合快照的请求,这次参数设置的不一样。具体是这样的:Refresh Method:Automatic Refresh
Threshold for Truncating Logs :0 这次运行10几分钟就完了。再次运行计划数据提取,6分钟搞定!这里需要注意的是,你需要单独提交个请求而非由计划数据提取请求去带动。尽管你把刷新集合快照请求的默认参数都设置好了,但是它在计划数据提取请求里依然不起作用。我在这里耽误了一点时间。OK,现在到生产环境做同样的事情。刷新集合快照在生产环境用了20分钟完成。在跑这个请求之前我记录下了表空间的使用情况,因为我知道MLOG$_*表所在的数据文件肯定会减小。最后对比发现果然是这样,其中变化最大的表空间是APPS_TS_SUMMARY,减小了10G。当用户再次跑计划数据收集请求的时候只用了24分钟,刷新集合快照只用了6分钟。你一定想知道这其中的原理吧?OK,看完了以下的介绍,你就会明白了。以下说明来自metalink。
Goal
Checked or relevance 26-APR-2008
The Refresh Collection Snapshots can be run as a standalone concurrent request using the following steps
This is recommended as a first step when launching Data Collections for the first time on a new or cloned instance.
Solution
The request is available under the Material Planner responsibility.
OR in 11.5.10 and above in Advanced Supply Chain Planner.
OR using System Administrator, it can be added to the Request Group OM Concurrent Programs
so that it is visible when for ATP Data Collections from Order Management
For Distributed installations with separate Instances for Transactions (ERP Source) and Planning (APS Destination) , this is ALWAYS run on the ERP Source Instance
NAV: Material Planner (or ASCP) / View / Requests / Submit a new Request
Parameters:
1. Refresh Method - Fast / Complete / Automatic
Fast Refresh will only process changes since the last run of the Refresh Snapshots or Data Collections.
Complete Refresh will process all the snapshots and completely refresh the snapshot(s).
Automatic Refresh will truncate the snapshot log and then run a complete refresh of the snapshot(s).
Note: Automatic Refresh does NOT check the profile MSC: Source Setup Required, so if running for the first time after a patch or in an newly cloned instance, then use Complete Refresh
Note: When running Data Collections request set, Refresh Snapshots always uses a Fast Refresh for performance reasons. This is usually not a problem, unless the snapshot logs have grown very large. Then corrective steps are required.
2. Snapshot Name - Use the Default 'All Snapshots' or choose a snapshot from the list
3. Threshold for Truncating Logs - Used only in Automatic Refresh.
Enter the number of lines to check for in the MLOG$ table.
For example, a value of 50 will truncate the log if 50 or more lines are in the MLOG$ snapshot log table and then perform a complete refresh of the snapshot.
Use 0 when running this to help resolve performance issues.
4. Degree - Refers to the degree of Parallelism to be used by the database to process the request.
Maximum value is 10. The higher the degree of parallelism, the faster the process will complete.
Note: Entering a number higher than 10 will result in the use of 10 for this value.
Also see Data Collections Fails - First Steps to understand the Setup Requests that are run by the Refresh Snapshots concurrent program when MSC: Source Setup required = Y
For Performance or Errors with Refresh Snapshots standalone or as part of the Data Collection request set
- First step is to run standalone for All Snapshots with Automatic Refresh and Threshold = 0. Then run Data Collections with a Complete Refresh.
This step is highly recommended for new or cloned instances before Data Collections request set is launched for the first time.
The standalone request with Automatic refresh may take a long time to complete if the snapshot logs have become very large. But this should result in better performance when running the Data Collections request set.
- IF errors with tables or tablespace occur (ORA-1654, ORA-1653, ORA-1652 ORA-1655, ORA-1651), THEN you will need to add space to the system. should be helpful for information about managing space requirements for APS.
- IF snapshot errors occur (ORA-12008, ORA-12057), THEN run Automatic refresh for the individual snapshot listed in the log file to attempt to correct the problem.
-- Sometimes the next Data Collections set will error on a different snapshot. IF this occurs, THEN run Automatic refresh for ALL Snapshots to resolve the issue.
- An ORA-12008 or ORA-12057 may be followed by a rollback error (ORA-1555, ORA-30036).
A lot of rollback is required for large data sets. Try adding more space to the rollback
OF IF rollback (undo) space is already very large (>= 20 GB), and running Automatic or Complete Refresh for ALL Snapshots fails, THEN run the Refresh Snapshots for EACH snapshot individually using Automatic Refresh
may also be helpful with information about how to overcome these errors and setting up rollback.
- This request can be useful in controlling the growth of the MLOG$ Snapshot Logs that are maintained on these snapshots. Please review MLOG Management
- For more information on the latest Data Collections patches please review List of High Priority Patches for the APS Suite .
阅读(3006) | 评论(0) | 转发(0) |