Identifying SQL Server Performance Problems Part 3

by Feb 16, 2021

Continuing with our Performance Tuning series as you have learned baselining and knowing your normal system performance metrics is an important part of performance tuning. In part 1 we discussed collecting vital system metrics such as CPU Utilization and Memory usage. In part 2 we took a glimpse into how we can gather those metrics and other using the different tools that are available to use like Performance Monitor, Profiler, Extended Events and DMVs. In this part 3 we will look at how identifying the appropriate waits events and corresponding resolution will be critical for resolving performance issues. But first you need to understand what waits are and where to find the information needed to identify waits in your system.  

 

When a thread is being executed and is forced to wait on something this is called queueing. The thread waits for system resources, lining up in a first come first served fashion waiting for things like a CPU scheduler. The time waiting between actively running and suspended (waiting) states is the measurable wait event that SQL Server keeps track of via using. internal metrics. The database engine tracks why executed waits must wait and how long they are queuing for resource threads. These are commonly called wait statistics. These metrics are easily discernible using the dynamic management view (DMV) sys.dm_os_wait_stats (or in Azure SQL Database sys.dm_db_wait_stats).  This information is important to understanding what baseline performance of your database and can help you identify specific performance issues both with query execution and hardware limitations. Data gained from this view can be aggregated and tracked to provide a clearer picture on what might be causing the executed thread to wait. Knowing the top waits that occur in your environment is a key task for all DBAs. Below you will learn common actionable waits as well as those that are benign and occur normally as part of SQL Servers thread queueing process.  

 

Identify Waits 

SQL Server gives us dynamic management views (DMVS) that are available within SQL Server starting with SQL Server 2005. These DMVs are useful means to determine what is going on with a particular SQL Server instance. The one below will allow you to see the current wait statistics inside a particular instance use this to start to become familiar to common waits inside your environment and to trouble shoot performance issues.

sys.dm_os_wait_stats This gives you all the waits currently happening in your system. It gives to a great trail of what is going on to diagnose problems the system is having. It provides wait type and times as well as how many tasks are waiting.

USE master

GO

SELECT * FROM sys.dm_os_wait_stats

GO

 

This view will give you the wait type, waiting tasks counts, wait times in milliseconds and signal wait times. Signal wait times is the time it is waiting for CPU resources.

Actionable Waits 

Common potentially actionable waits every database administrator should be aware of are. These are waits that could be an indication of performance issues within your instance.

 

ASYNC_NETWORK_IO 

Occurs on network writes when the task is blocked behind the network. You could be experiencing a slow network pipe between an application and the SQL Server. It also can just be that the user is on an underpowered client machine, or a large row-by-row processing happening on the application server. A wait such as this is sometimes out of the control of the DBA, you may need to work with your network administrators to troubleshoot. 

 

CXPACKET  

If this wait type is high, it can indicate improper configuration. Prior to SQL Server 2019, the Max Degree of Parallelism (MaxDOP) default setting is to use all available CPUs for queries. Additionally, the cost threshold for parallelism (CTfP) setting defaults to 5, which can lead to small queries being executed in parallel, which can limit throughput. Lowering MaxDOP and increasing CTfP can reduce this wait type, but the CXPACKET wait type can also indicate high CPU utilization, which is typically resolved through index tuning. I find it easier to just think of this as the overhead of putting back together parallel threads once they are done processing. Too many threads to lead to higher waits. 

 

SOS_SCHEDULER_YEILD  

This wait type can indicate high CPU utilization which is generally correlated with either very high workloads high number of large scans, or missing indexes, often in conjunction with high levels numbers of CXPACKET waits. Sometimes this is an indicator of over allocation of vCPUs on a host. Working with your Virtual Machine administrators may help to resolves issues associated with these. 

 

PAGEIOLATCH_XX*  

There are many types of PAGEIOLATCHs (EX,SH,UP,NL,LP,DT,KP) they are when SQL Server is waiting to read data pages, I/O request, from storage. These pages were not cached in memory, so you have to go to disk to retrieve or process the information. PAGEIOLATCHs can indicate an issue with the disk subsystem. Before you go running to you SAN administrator, you may be able to solve these issues with proper indexing. 

 

LCK*  

Like PAGEIOLATCHs there are many types of locks, LCK. By design SQL Server locks resources to hold them during transactions like updates to prevent concurrent transactions from modifying the same data or reading data that is being actively modified. Minimizing this lock time can increase performance. Making transactions as fast as possible helps to reduce this lock time. We can do this again through proper indexes and efficient code. 

 

WRITE_LOG  

As we know the transaction log keeps a sequential record of all changes made to data in a database. That mean every delete/update/insert (operations done by users are tracked, and SQL Server has to write those to the transaction log file before committing the transactions and returning completion confirmations. Having long WRTIE_LOG waits can indicate your transaction log storage is not keeping up with your workload. 

 

 OLEDB 

This wait type indicates that a SPID has made a function call to an OLE DB provider and is waiting for the function to return the required data. This wait type may also indicate that the SPID is waiting for remote procedure calls, linked server queries, BULK INSERT commands or full-search queries. High waits of this type can be reduced by minimizing the use of linked server and examining the environment for network performance issues. 

 

RESOURCE SEMAPHORE 

This wait type is indicative of queries waiting on memory to become available and may indicate excessive memory grants to some queries. This is typically observed by long query runtimes or even time outs. These waits can be caused by out-of-date statistics, missing indexes, and excessive query concurrency.  

 

Benign Waits 

 

There are also benign waits, these are non-actionable waits that usually do not indicate performance issues and are normal to see within your environment. It is important to know the difference so that you don’t find yourself barreling down rabbit holes chasing waits you do not need to. 

 

CXCONSUMER 

Occurs with parallel query plans when a consumer thread (parent) waits for a producer thread to send rows. CXCONSUMER waits are caused by an Exchange Iterator that runs out of rows from its producer thread. Added in 2016 SP2 and 2017 RTM CU3 This is a normal part of parallel query execution. It used to be included in the CXPACKET wait type but was separated out to reduce knee-jerk reactions to high waits of that type. 

 

CHECKPOINT QUEUE  

Occurs while the checkpoint task is waiting for the next checkpoint request. It’s a simple as that, its an idle process waiting to be called up to work and safely ignored. 

 

WAITFOR 

According to msdocs, ooccurs as a result of a WAITFOR Transact-SQL statement. The duration of the wait is determined by the parameters to the statement. This is a user-initiated wait as shown in example below. 

Example 1 

EXECUTE sp_add_job @job_name = 'TestJob';   

BEGIN   

    WAITFOR TIME '22:20';   

    EXECUTE sp_update_job @job_name = 'TestJob',   

        @new_name = 'UpdatedJob';   

END;   

Example 2 

BEGIN   

    WAITFOR DELAY '02:00';   

    EXECUTE sp_helpdb;   

END;   

 

PARALLEL REDO LOG CACHE 

Applies to SQL Server 2016 and later. This wait is directly related to a read thread on an Availability group replica. It is part of the parallel redo process and happens naturally. It is not an indication of a replica bottleneck or backlog. 

 

LAZYWRITER_SLEEP 

A Lazy Writer is used when the instance is experiencing memory pressure. If its waiting to be used, it sleeps. This wait is the measure of the sleep time, the time accumulated as it is waiting until it is needed again. You can think of this as a good thing if the wait times are high. 

 

Summary

Performance monitoring is an essential part of database and instance health and knowing which wait stats to pay attention to is a big part of that. As a DBA it is not important to memorize each one of these, there are great sites that will help you discover what each type is. However, it is very important for you to baseline you waits, become familiar with what is normal inside your environment and which are actionable, and which are not. This just a very small subset of waits within SQL Server, I highly encourage you to dig deeper and learn more.

SQL Diagnostic Manager for SQL Server monitors the performance of your entire SQL Server environment and provides the most comprehensive diagnostics on the market. Learn more here.