Tips for Optimizing XML in SQL Server

by Sep 5, 2012

I’ve worked on a project that used XML heavily inside SQL Server. We really utilized SQL Server’s XML support almost to the full extent, but with some repercussions. As we did our load testing, performance did degrade and we had to step back and adjust how we were using our XML data.

If you are using XML in your database, you may want to consider these few tips to optimize your overall queries and performance related to your XML data.

1. Promote frequently used elements and attributes into relational columns

If you find yourself always pulling out scalar values from your XML column in order to join to another table, you should consider “promoting” this value into a column.

Pro: This value can now be indexed, therefore can boost performance of your queries.

Con: It’s an additional management overhead. If you need to change the value in the XML document, you will also need to change the value in your relational column. You can consider doing this in the application level (ie change the value in both places at once), or creating a persisted calculated column that uses a UDF that extracts the scalar value for you, or even triggers (be careful! make sure you test before you implement in production).

2. Add a schema to your XML columns

XML is really just a text document. This poses an overhead to SQL Server because everytime you do operations, SQL Server will need to “guess” which data types may be appropriate for your operation before it does an implicit conversion. You can eliminate this step by creating typed XML columns, or XML that is bound to a schema (XSD).

Pro: Processing your XML will be faster than if you’re using untyped XML. This eliminates the data type guess work from SQL Server.

Con: This can create a management overhead. Your XML now suddenly becomes not-so-flexible anymore. Everytime you need to add a new element or attribute or new nested fragment, you will need to ALTER the schema first before you can make changes. This change may need to cascade down to all your stored procedures that use this column.

3. Use XML Indexes

You must first create a Primary XML index, then create secondary index(es).

-- sample primary XML index
CREATE PRIMARY XML INDEX invoiceidx
ON [Sales].[salesxml](xmlcontent)
GO

There are 3 main secondary XML indexes:

-- PATH Secondary XML Index
-- useful if you use the path, and if you do not have wildcards
SELECT
   xmlcontent.value('(sales/order[@ord_num="6871"])[1]', 'varchar(20)') ,
   othercols
FROM
   [Sales].[salesxml]
WHERE
   xmlcontent.exist('(sales/order[@ord_num="6871"])') =1

-- PROPERTY Secondary XML Index
-- useful if searching for multiple properties, but may not have the full path

-- sample
SELECT
   othercols
FROM
   [Sales].[salesxml]
WHERE
   xmlcontent.exist('(//title_id)') = 1

-- VALUE  Secondary XML Index
-- useful if you know exact value looking for, but may not have full path

SELECT
   othercols
FROM
   [Sales].[salesxml]
WHERE
   xmlcontent.exist('/sales/order[@ord_num="6871"]/text()[. = "Something"]') = 1

SELECT
   othercols
FROM
   [Sales].[salesxml]
WHERE
   xmlcontent.exist('(//title_id/@*[. = "special"])') = 1

Pro: As with regular indexes, faster searches.

Con: As with regular indexes, occupies more storage, and needs more resources needed to reprocess your indexes.

4. Recreating XML on Mass Updates

Ever wanted to mass update an element or attribute property in your XML? Tempted to use SQL Server’s support for XQuery? Unfortunately in my case, for some of the updates I needed to do, XQuery just did not cut it. The .modify() method of your XML column or variable is limited to updating just that instance of XML. If you want to use XQuery .modify() to update more than one item, you will have to loop through all XML instances and invoke .modify() for each one.

A possible alternative, depending on how big your XML data is, is recreating the XML. If the data you need for XML is already in other existing columns, you can just do a query with a FOR XML PATH and include the new values. Use with caution. It worked for my purposes, your mileage may vary.

Pro: It could be faster. YMMV.

Con: It requires a little bit more coding, and re-assembling your XML.

References and Additional Readings:
MSDN. Performance Optimizations for the XML Data Type in SQL Server 2005.

MSDN Blogs. Performance Tips of Using XML Data in SQL Server.