dbo.mc_EnumPolicyLocations performance

by May 7, 2014

Recently, when attempting to create a new Restore Policy, we’ve found that the SQLSafe Management Console will sit and spin for about 30 seconds before continuing with the Wizard, and when you attempt to populate a Restore source from a Backup Policy, it will incorrectly complain that there are no known backup locations for the selected source database.
I traced my Console’s activity on the DB server for the SQLSafe repo and found it running dbo.mc_EnumPolicyLocations for quite a while. Running this proc directly, I get the following stats: elapsed time: 57,559ms, CPU time 50,125ms, logical reads (total): 1,504,266, worktable LOB reads: 11,282. The vast majority (99.5%) of the logical reads are on a worktable.
Looking into the query plan, the stats estimates go way off-base in the pdb subquery where it left joins to dbo.databases four times to identify database_id (lines 49 through 88). When I collapse it into a single join, I get a much different plan with these exec stats: elapsed time: 953ms, CPU time 955ms, logical reads (total): 16,422, worktable LOB reads: 0.

These are both warm queries and the DB server has more than enough memory available to fit all of this into RAM easily. The replacement I used for lines 49-88 is as follows (let me know if you’d prefer to have the full queries emailed):
(SELECT pim.mode,
FROM dbo.policies_instances_mode pim WITH (NOLOCK)
INNER JOIN dbo.databases d WITH (NOLOCK) ON d.instance_id = pim.instance_id
LEFT JOIN dbo.policies_databases_membership pdm
ON pdm.policy_id = pim.policy_id
AND d.database_id = pdm.database_id
pim.mode IN (1, 2, 3)
AND pdm.database_id IS NULL
d.deleted = 0
pim.mode = 1
OR (pim.mode = 2 AND d.sysdb = 0)
OR (pim.mode = 3 AND d.sysdb = 1)
) OR (
pim.mode = 4
AND pdm.database_id IS NOT NULL
) alldb
ON alldb.policy_id = p.policy_id

This rewrite is semantically equivalent to the original.

Addendum: I’m sure there’s also an improvement possible for the IN and NOT IN subselects to make them scale better, but there are only 506 rows in PDM in my environment and only ~1700 total real databases and I cannot see those numbers growing by a large enough factor to hurt this execution plan enough to cause problems.
Edit: After wrapping my brain around it correctly, I also condensed the IN and NOT IN correlated subqueries to a LEFT OUTER JOIN with appropriate WHERE conditions that is equivalent. I updated the query text and stats accordingly.