Counter calculations in SQL Diagnostic Manager

by May 1, 2014

Categories

Tags

Administration agent-based monitoring Agentless Monitoring alert responses alert thresholds alerting Alerts Amazon Aurora Amazon EC2 Amazon RDS Amazon RDS / Aurora Amazon RDS for SQL Server Amazon Redshift Amazon S3 Amazon Web Services (AWS) Analytics application monitoring Aqua Data Studio automation availability Azure Azure SQL Database azure sql managed instance Azure VM backup Backup and recovery backup and restore backup compression backup status Backup Strategy backups big data Blocking bug fixes business architecture business data objects business intelligence business process modeling business process models capacity planning change management cloud cloud database cloud database monitoring cloud infrastructure cloud migration cloud providers Cloud Readiness Cloud Services cloud storage cloud virtual machine cloud VM clusters code completion collaboration compliance compliance audit compliance audits compliance manager compliance reporting conference configuration connect to database cpu Cross Platform custom counters Custom Views customer survey customer testimonials Dark Theme dashboards data analysis Data Analytics data architect data architecture data breaches Data Collector data governance data lakes data lineage data management data model data modeler data modeling data models data privacy data protection data security data security measures data sources data visualization data warehouse database database administration database administrator database automation database backup database backups database capacity database changes database community database connection database design database developer database developers database development database diversity Database Engine Tuning Advisor database fragmentation database GUI database IDE database indexes database inventory management database locks database management database migration database monitoring database navigation database optimization database performance Database Permissions database platforms database profiling database queries database recovery database replication database restore database schema database security database support database synchronization database tools database transactions database tuning database-as-a-service databases DB Change Manager DB Optimizer DB PowerStudio DB2 DBA DBaaS DBArtisan dBase DBMS DDL Debugging defragmentation Demo diagnostic manager diagnostics dimensional modeling disaster recovery Download drills embedded database Encryption End-user Experience entity-relationship model ER/Studio ER/Studio Data Architect ER/Studio Enterprise Team Edition events execution plans free tools galera cluster GDPR Getting Started Git GitHub Google Cloud Hadoop Healthcare high availability HIPAA Hive hybrid clouds Hyper-V IDERA IDERA ACE Index Analyzer index optimization infrastructure as a service (IaaS) infrastructure monitoring installation Integrated Development Environment interbase Inventory Manager IT infrastructure Java JD Edwards JSON licensing load test load testing logical data model macOS macros managed cloud database managed cloud databases MariaDB memory memorystorage memoryusage metadata metric baselines metric thresholds Microsoft Azure Microsoft Azure SQL Database Microsoft PowerShell Microsoft SQL Server Microsoft Windows MongoDB monitoring Monitoring Tools Monyog multiple platforms MySQL news newsletter NoSQL Notifications odbc optimization Oracle PeopleSoft performance Performance Dashboards performance metrics performance monitoring performance schema performance tuning personally identifiable information physical data model Platform platform as a service (PaaS) PostgreSQL Precise Precise for Databases Precise for Oracle Precise for SQL Server Precise Management Database (PMDB) product updates Project Migration public clouds Query Analyzer query builder query monitor query optimization query performance Query Store query tool query tuning query-level waits Rapid SQL rdbms real time monitoring Real User Monitoring recovery regulations relational databases Releases Reporting Reports repository Restore reverse engineering Roadmap sample SAP Scalability Security Policy Security Practices server monitoring Server performance server-level waits Service Level Agreement SkySQL slow query SNMP snowflake source control SQL SQL Admin Toolset SQL CM SQL code SQL coding SQL Compliance Manager SQL Defrag Manager sql development SQL Diagnostic Manager SQL Diagnostic Manager for MySQL SQL Diagnostic Manager for SQL Server SQL Diagnostic Manager Pro SQL DM SQL Doctor SQL Enterprise Job Manager SQl IM SQL Inventory Manager SQL Management Suite SQL Monitoring SQL Performance SQL Quality SQL query SQL Query Tuner SQL Safe Backup SQL script SQL Secure SQL Security Suite SQL Server sql server alert SQL Server Migration SQL Server Performance SQL Server Recommendations SQL Server Security SQL statement history SQL tuning SQL Virtual Database sqlmemory sqlserver SQLyog Storage Storage Performance structured data Subversion Support tempdb tempdb data temporal data Tips and Tricks troubleshooting universal data models universal mapping unstructured data Uptime Infrastructure Monitor user experience user permissions Virtual Machine (VM) web services webinar What-if analysis WindowsPowerShell

Probably the three most common questions I get about counters in SQL Diagnostic Manager are:

“How can I prevent alerting on temporary noise values?”
“How does collecting once every 6 minutes give enough data for alerting?”
and, the big one
“Why is SQLdm showing me different data than I see in PerfMon?”

The answer to all of these questions lies in an understanding of the kind of counters used by SQL Diagnostic Manager, and I will endeavor to explain that here.

Cooked and Raw Counters

Windows provides two basic kinds of counters, called “cooked counters” and “raw counters.” As a user you are almost certainly most used to looking at cooked counters – when you want to see a CPU percentage, you want to see a value between 0 and 100 which corresponds to the percentage of CPU being used by the system at that time. However, within a monitoring product like SQL Diagnostic Manager, it’s much more important to understand raw counters. These are the raw numbers which are used under the covers to calculate the values you are used to seeing.

While this logic is true both in Windows and in SQL Server counters, for the purposes of this demonstration I am going to focus on OS counters for a minute – specifically, on CPU counters, which are relatively well understood. SQL Diagnostic Manager utilizes the Win32_PerfRawData_PerfOS_Processor WMI class to collect data for the alertable values OS Processor Time, OS Privileged Time, and OS User Time. This is true both for direct WMI and OLE Automation configurations of SQLdm. These three alerts come from the PercentProcessorTime, PercentPrivilegedTime, and PercentUserTime counters from the Win32_PerfRawData_PerfOS_Processor class. You can see the values of these counters by running the following Powershell command:

Get-WmiObject -Class Win32_PerfRawData_PerfOS_Processor |
Where-Object { $_.name -eq '_Total'} |
Format-List -Property PercentProcessorTime, PercentPrivilegedTime, PercentUserTime

On my laptop, the values returned look like so:

PercentProcessorTime : 9341260631492

PercentPrivilegedTime : 119022171957

PercentUserTime : 322043160366

Converting Raw Counters to Cooked Values

These values obviously are not useful as they are. They only become useful as you sample them over time. If I run the command again, I’ll receive slightly different values, like so:

PercentProcessorTime : 9343379827076

PercentPrivilegedTime : 119083850853

PercentUserTime : 322093412188

This continues over the entire life of the computer – these counters keep going up, in small increments and large, and SQL Diagnostic Manager keeps sampling them. From there, the next step is to make these into readable “cooked” values.

Each of these counters has an associated formula which is used to convert these raw values, and the deltas between them, to usable “cooked” counters that can be easily consumed by users. If you refer to the Win32_PerfRawData_PerfOS_Processor documentation referenced earlier, you’ll notice each counter has a CounterType value associated with it. You can use that CounterType to look up the related cooking formula from the table here. Once you know the formula name (such as PERF_100NSEC_TIMER_INV) you can then look up the full formula documentation, such as that found here. Doing this kind of research for the hundreds of different counters used by SQL Diagnostic Manager may sound like a pretty tedious process – but believe me, I’ve done it!

Why Bother With Raw Counters?

So, by collecting these raw counters and running them through very specific, well-documented formulas, we’re able to get user-readable values. But why do we go through all this trouble when cooked counters are available? That goes back to the questions I raised at the very beginning of this article.

For the most part, cooked counters are brief samples plucked from the operating system and used to determine a broad trend. For a program like PerfMon, which is running a collection once per second, this is generally fine – you are probably looking at the overall direction of the trendline and perhaps the min, max, and average value for the timeframe. In such a case, the fact that each data point is only accurate to a few milliseconds of time does not have too much of an effect. It’s also not too surprising or distracting when you see the occasional outlier data point – it’s a quick blip and it has very little effect overall. But if you were to change the PerfMon settings such that it was collecting a data point only every 6 minutes, it would absolutely miss a great deal of information! Between one data collection and the next you could have a huge spike in activity that brings the server to its knees for 4 or 5 minutes, and then resolves itself, and you’d never know.

That kind of behavior would be unacceptable in a product like SQL Diagnostic Manager – both on the outlier side and on the “missing a spike in activity” side. By using the raw counter classes, we are able to instead get information about everything that happened between our two collections, not just the counters which happen to be active at the moment that a refresh happens. This means that DM naturally smooths out brief blips and also naturally captures spikes in activity between collections. This also means that the product almost never matches PerfMon precisely – the methods of collection are really just too different!

Try It For Yourself

To quote Reading Rainbow, “You don’t have to take my word for it!” I’ve put together a little script and an Excel spreadsheet you can use to collect your own raw values and cook them yourself. I encourage you to give it a try, and compare the output you see to what you see in PerfMon. Try decreasing the frequency of PerfMon’s collections, and try spinning up CPU cycles to see if you can cause PerfMon to miss a spike – and then see if you were able to capture it with the raw counters.

Here’s a Powershell script which collects raw counters every 30 seconds for 5 data points:

$i = 1

while ($i -lt 5)
{
"Sample " + $i
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Processor |
where-object { $_.name -eq '_Total'} |
Format-List -Property PercentProcessorTime, PercentPrivilegedTime, PercentUserTime, TimeStamp_Sys100NS
Start-Sleep -Seconds 30
$i = $i + 1
}