Consider Your Collations Carefully, Or Pay Later

by Jun 13, 2012

When first installing SQL Server, many developers and DBAs new to SQL Server may have the temptation to click Next, Next, Next on the installation wizard. However, it is of utmost importance to pay close attention, and be conscious, of options that you choose. We sometimes think that if we make the wrong choice when we install it, we can always go back and just change that option. Easy as that.

Unfortunately, that’s not true for all cases. Collation is one of those options that – if chosen without careful consideration – can be a costly mistake to correct. Correcting a wrong collation choice can include any of the following:

  • Making queries that require use of tempdb more complicated; you will need to add COLLATE statements everywhere (for example in your WHEREs and JOINs)
  • Reinstalling your instance
  • Worse case, in addition to reinstalling your instance, you may also need to recreate your tables, and reimport all your data.

Neither a database restore nor a database attach will do. You will need to reimport the records to correct the collation.
This might sound abstract, so let me revisit one of the issues I had with collation in a previous project.

The Scenario

  • A few years back, some of our SQL Server instances were installed using the default collation SQL_Latin1_General_CP1_CI_AS
  • A new requirement came up, which required us to support case insensitive, accent insensitive content using collation Latin1_General_CI_AI
  • To meet the new requirements, we changed some of our user database collations to Latin1_General_CI_AI. Our default server collation – which is shared by the system databases – remained as SQL_Latin1_General_CP1_CI_AS

The Problems

We thought we’ve found an easy solution by just changing the user database’s collation. However, we found that our queries that use temp tables started erroring out, and we had to put COLLATE explicitly in our T-SQL statements.

We use temporary tables in some of our important stored procedures. Recall that the tempdb gets the model database’s collation, and if ever you need to work on both a user database and temporary objects, you will need to specify COLLATE in your queries. Otherwise, you may get an error similar to the following:

Cannot resolve the collation conflict between “Latin1_General_CI_AI” and “SQL_Latin1_General_CP1_CI_AS” in the equal to operation. (Microsoft SQL Server, Error: 468)

So we have concluded we can fix our issues by doing the following:

  • Changing the model database’s collation, so that all new databases get the collation we want
  • Changing the tempdb collation, so that all temporary objects don’t have to use explicit collation

Background Check

There are many people who have blogged about how to go around this issue.
Kimberly Tripp had a article on Changing Database Collation and dealing with TempDB Objects which was a lifesaver at that time.

Kimberly basically suggests to create the temp table objects while inside the user database, and to explicitly specify a database_default collation on every column, as such (code snippet adaptation from her example):

This forces the tempdb object to use whatever collation you have specified for your user database.
Michelle Gutzait also has a very good article on collation fixes. In this article she provides:

  • Sample collation conflict error messages
  • Tips on how to resolve collation problems
  • Script to find problematic databases
  • Script to change temporary table collation

The Experiments

Tempdb gets recreated every time the SQL Server service is restarted, and tempdb gets its default properties from the model database.

So if we can change the model database’s collation, problem solved right?
No. Changing a system database’s collation is not an easy ALTER DATABASE statement.

These are the two experiments I tried:

Experiment 1:

Attach model db with correct collation (in our case Latin1_General_CI_AI)

1. Installed a new instance of SQL Server (SQLCIAI) with the same Service Packs, and the collation we wanted
2. Stop SQLCIAI, and make a copy of the model mdf and ldf file
• modelciai.mdf
• modelciai.ldf
3. Stop the existing instance (SQL02)
4. Restart existing instance in single user mode by passing the following startup options (http://support.microsoft.com/kb/224071)

-c shortens startup time when starting SQL Server from the command prompt
-m starts an instance of SQL Server in single-user mode
-T3608 prevents SQL Server from automatically starting and recovering any database except the master database

5. Detach model db, and attach new model db

6. Start up in multi user mode.
7. Test.

Experiment 1 Result: Seemed successful at first. Model db and tempdb get the collation we want. But when I go to Management Studio and right click to see database properties, I get the following error:

Cannot resolve the collation conflict between “Latin1_General_CI_AI” and “SQL_Latin1_General_CP1_CI_AS” in the equal to operation. (Microsoft SQL Server, Error: 468)

Experiment 2:

Restore model db with correct collation (in our case Latin1_General_CI_AI)

1. Installed a new instance of SQL Server with a Case Insensitive Accent Insensitive collation (SQLCIAI) with the same Service Packs as our production database

2. Backup model db of SQLCIAI

3. Restore to existing instance

4. Restart service

Experiment 2 Results: same as Experiment 1

Conclusion

You see what we’ve tried to do. It was wishful thinking that there was an easy switch to make multi-collation databases work (in our environment, at least). I spent a bit of time experimenting how we can work with mixed collations, but without much success.

You may have had these issues before, so your mileage may vary. In our case, we have conceded to the fact we need to set aside some time to rebuild our servers (see MSDN: Setting and Changing the Server Collation). This should be the best way to change the collation issue.

Moral of the story – collation is one of those properties that you might overlook because you think it’s trivial. It’s not. You need to carefully consider exactly which collation you will need, especially if you are supporting localization in your applications. Choosing the wrong collation can be a very expensive mistake.