Performance troubleshooting is one of the unique challenges every developer and DBA has to face. One of the major complaints about performance troubleshooting is utilization of resources. Few organizations resort to putting muscle power to scale up the system without analyzing the root cause of the performance problem. For example, a typical scenario might involve a SQL Server instance with memory that is running continuously slow; frequently, the DBA responds to this by recommending that the RAM on the server be increased. Because the root cause of the memory issues is never fully investigated, this process is likely to repeat itself regularly over time. On the contrary, if the CPU is consistently running at a 70-80%, it isn’t always easy to increase the CPU instantaneously (provided we are on a physical machine).
Whenever there is a high CPU issue reported on SQL Server machine, we should first determine whether the issue was caused by SQL Server or some process outside of SQL. Essentially, it’s necessary to narrow down the high CPU problem to the lowest possible level–the component which is causing high CPU.
Exploring ring buffers to get historical data
Since SQL Server keeps historical data about CPU usage in ring buffers, we can query and find the history available on the system at any point in time. The example below is taken from Performance Dashboard reports query:
DECLARE @ms_ticks_now BIGINT SELECT @ms_ticks_now = ms_ticks FROM sys.dm_os_sys_info; SELECT TOP 15 record_id ,dateadd(ms, - 1 * (@ms_ticks_now - [timestamp]), GetDate()) AS EventTime ,SQLProcessUtilization ,SystemIdle ,100 - SystemIdle - SQLProcessUtilization AS OtherProcessUtilization FROM ( SELECT record.value('(./Record/@id)[1]', 'int') AS record_id ,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS SystemIdle ,record.value('(./Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS SQLProcessUtilization ,TIMESTAMP FROM ( SELECT TIMESTAMP ,convert(XML, record) AS record FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = N'RING_BUFFER_SCHEDULER_MONITOR' AND record LIKE '%%' ) AS x ) AS y ORDER BY record_id DESC
The query above shows 15 minutes worth of data. We can change top 15 in the query if we want to get more details.
Understanding the relation between CPU and PerfMon
Once we have identified that SQL Server process is consuming CPU, we have to next find which inside SQL Server process is consuming this CPU. We need to remember that CPU consumes time in two modes:
- User mode
- Kernel mode
This can be seen via Performance Monitor by monitoring the “% Privileged Time” and “% User Time” counters under the “Process” node. If “% Privileged Time” value is more than 30%, it’s generally caused by faulty system drivers or antivirus software. In such situations, make sure the BIOS and filter drivers are up-to-date, and then try disabling the antivirus software temporarily to see the change.
If “% User Time” is high then there is something consuming the user mode of SQL Server. There are several known patterns which can cause high CPU for processes running in SQL Server, including:
- Query executing causing high CPU
- System tasks are consuming CPU
- Excessive Compilation and Recompilation of queries
Examining SQL Server processes
Now let’s examine what is happening in our system. The query below can help in finding the currently executing queries in SQL Server:If a SQL Server process is consuming high CPU, then executing the above query can help in finding the various requests currently getting executed inside SQL Server.
SELECT r.session_id ,st.TEXT AS batch_text ,SUBSTRING(st.TEXT, statement_start_offset / 2 + 1, ( ( CASE WHEN r.statement_end_offset = - 1 THEN (LEN(CONVERT(NVARCHAR(max), st.TEXT)) * 2) ELSE r.statement_end_offset END ) - r.statement_start_offset ) / 2 + 1) AS statement_text ,qp.query_plan AS 'XML Plan' ,r.* FROM sys.dm_exec_requests r CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp ORDER BY cpu_time DESC
The output shows the data sorted by CPU. Once the query is identified, we have several options to try in tuning the query consuming the CPU, including:
- Make sure that the statistics are up-to-date for the underlying tables used.
- Check if the optimizer is suggesting any missing indexes in XML plan. If yes, evaluate and then create them.
- Check if there are scan of big tables which can be avoided, and if data can be filtered based on access.
- Tune the query using Database Engine Tuning Advisor and evaluate the recommendations given.
Sometimes it’s good to look at all queries executed so far and get the top CPU consumers from the query plans available in plan cache. The CPU might be normal at this point, but we want to get historical data. This can be achieved using query stats dynamic management views. Below query gives us an overview of cached batches or procedures which have used most CPU historically:
select top 50 sum(qs.total_worker_time) as total_cpu_time, sum(qs.execution_count) as total_execution_count, count(*) as number_of_statements, qs.plan_handle from sys.dm_exec_query_stats qs group by qs.plan_handle order by sum(qs.total_worker_time) desc
It’s important to remember that above query gets data from the cached plan. This means that, if the plan is evicted from cache for one of the top CPU consumers, we may miss the same.
If the top CPU consumer is one of the system processes, check the kind of process and see if there is any known issue about that process on . Some of the common searches that show high CPU on SQL Server because of system processes are:
- (Resource Monitor may consume high CPU)
- (The Ghost Cleanup task uses 100% of the CPU on an idle system in SQL Server 2008 or in SQL Server 2005)
- (FIX: SQL Server process that uses the.NET Framework 2.0 does not make any progress during a garbage collection while CPU usage is high)
- (Degraded overall performance on Windows Server 2008 R2)
And, of course, third-party performance tuning solutions such as can be of use in determining the root cause of performance issues such as high CPU usage.
says
One of our systems engineers ended up in this website and found the examination query pretty useful. Although we ended up going in a different direction, take a look at how we solved a storage performance issue by identifying rogue transactions coming from an SAP business Object.
The article can be found here:
says
Amazing!! This is the exactly what I was looking for. Thanks for share!
says
Good article.