SQL Server Compilation Bottlenecks – Error: 8628, Severity: 17, State: 0.

学而时习之gflei.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

gflei

博客访问： 2364217
博文数量： 310
博客积分： 6853
博客等级：准将
技术积分： 2833
用户组：普通用户
注册时间： 2005-08-04 16:41

文章分类

全部博文（310）

net（7）
工具（4）
Windows（26）

windows阀值（5）

硬件配置（3）

批处理（8）
UNIX&LINUX（52）

问题排查（8）

Security（1）

LINUX（31）

AIX（5）

HP-UX 学习（2）
JAVA（2）

RMI（1）
邮件服务器（1）
Mysql（20）

sql_reference（7）

优化（2）

数据库管理（6）

JDBC（2）

用户管理（1）
SQLSERVER（62）

性能（12）

SQL-REFERENCE（8）

maintain（15）

Concept（4）

lock（15）
Oracle（108）

SQL_REFERENCE（4）

Wait（3）

基础概念（3）

安全（11）

Client（5）

日常维护（7）

常见错误（27）

性能（19）

内存（9）

存储（11）

Lock（3）
LoadRunner（4）
应用服务器（4）

TOMCAT（2）

WEBSPHERE（1）
Apache（17）

虚拟主机（5）

与tomcat集成（5）

FAQ（0）
未分配的博文（3）

文章存档

2013年（17）

2012年（42）

2011年（76）

2010年（71）

2009年（99）

2007年（2）

2006年（1）

2005年（2）

我的朋友

相关博文

SQL Server Compilation Bottlenecks – Error: 8628, Severity: 17, State: 0. – Part 1

分类：数据库开发技术

2011-10-09 23:48:21

[alternative title: Memory Troubleshooting using DBCC MEMORYSTATUS]

The following samples are taken from a live problem I was working on for most of last week. The steps might well help you resolve an occurrence of the above error, but the methodology might help you with some general performance and memory troubleshooting as well, hence the alternative title.

Symptoms:

A previously well running server starts exhibiting extremely slow performance for all users during the online day.
CPU time is maxed out at 100%
The issue is intermittent and has no correlation to batch business processing or periods of peak usage

Initial Actions:

Configure a PSSDIAG (but could just have easily have been SQLDIAG) to collect perfstats script, perfmon counters
Don’t include profiler trace to begin with, as the server is already maxed out for CPU and you might skew the results or make the situation worse, especially as going into the problem you don’t actually know what the problem is related to
Configure the collection for long term running, as you don’t know when the error will next occur

Initial Results

When reviewing the error logs you note that Error: 8628, Severity: 17, State: 0. was logged repeatedly when the problem occurred in the past.
RESOURCE_SEMAPHORE_QUERY_COMPILE is the top aggregate wait stat in sys.dm_os_wait_stats
The server was clustered and a failover occurred when the problem last happened
No other items of note were recorded in the past
When the issue is not occurring everything looks fine on the server, there are no indications of dangerously long running queries or memory pressure etc.

This is a common scenario that I come across. You know something bad has happened, you can reproduce it on demand, but you suspect it will happen again. At this point I always like to do the above, review the general health of the server with PSSDIAG / SQLDIAG and then do another round of monitoring, once I know what I’m looking for. In this case I’m targeting the 8628 error as it seems pretty conclusively tied to the outage.

One of the principal hits you get if you search microsoft.com sites for error 8628 is the following page

How to use the DBCC MEMORYSTATUS command to monitor memory usage on SQL Server 2005

and it points out that this error is thrown when there are too many queries trying to compile simultaneously, and that the gateway structures are throttling the compilations. You can read more background about this theory in the above article and there’s also some reference to it in the following white paper

Troubleshooting Performance Problems in SQL Server 2005

which includes an excellent section on internal physical memory pressure.

So now we’re on the track of the problem. We reconfigure the PSSDIAG to include a script which runs DBCC MEMORYSTATUS regularly so that we can be sure to capture this output when the problem next occurs. We also make sure that we keep a copy of the data from this query when the system is running well, so that we can have a baseline to compare against. At this point I still didn’t have a profiler running as I didn’t want to have to manage the disk space and have the performance overhead, and the perfstats script add-in gives you so much information anyway.

So the problem eventually occurred again, and here are some key outputs from the DBCC MEMORYSTATUS related to the compilation gateways:

Baseline:

Procedure Cache                Value
------------------------------ -----------
TotalProcs                            7614
TotalPages                          570060
InUsePages                           32736

Small Gateway                  Value
------------------------------ --------------------
Configured Units                                 32
Available Units                                  32
Acquires                                          0
Waiters                                           0
Threshold Factor                             500000
Threshold                                    500000

Problem Occurring:

Procedure Cache               Value
------------------------------ -----------
TotalProcs                     397
TotalPages                     66888
InUsePages                     44924

Small Gateway                 Value
------------------------------ --------------------
Configured Units               32
Available Units                0
Acquires                       32
Waiters                        59
Threshold Factor               500000
Threshold                      500000

So what we can see here is that we are (as expected) encountering a big bottleneck on the gateways (the small one in this case) and at this moment in time 59 compilations are queued. What we can also note is that the reason that this appears to be the case is that the procedure cache is significantly smaller than its normal size, about 85% smaller in fact.

This therefore, is why so many queries are suddenly trying to compile. For reasons currently unknown at this point 85 % of the cached plans have been thrown away from the cache and so the many thousands of connections on this system are now all having to compile new plans.

Now you might be thinking, sometimes you see this type of behaviour when the cache is flushed. This is true, but in this case this wasn’t the problem. We sat and monitored the output above for quite a while, there was constant compilation queues but the cache was not growing. More to the point there were no cache flush events in the error logs, and we had not performed and maintenance operations which cause cache flush. I talked about a classic example of this behaviour in this post.

So we need to look elsewhere, and ask ourselves the question, “Why is my procedure cache 85% of its normal size?”

That’s for my next post.

阅读(5039) | 评论(0) | 转发(0) |

上一篇：十大监视SQL Server的性能SQL Server计数器

下一篇：linux 扩展逻辑卷实践记录

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6