By default, SQL Server databases are comprised of two files: the primary data file (or .mdf) and the log file (or .ldf). It is, however, possible to configure SQL Server databases to use additional files – which can be an effective means of increasing SQL Server performance in SOME scenarios.
Multiple Files – In Theory
The key to increasing performance by using multiple files hinges on three primary considerations:
- Optimizing SQL Server’s Asynchronous abilities for Parallelism.
- Correctly utilizing FILEGROUPS.
- Testing – to ensure overall performance increases.
In terms of optimizing SQL Server’s native capability for parallelism, it’s important to note that, by default, SQL Server will break up complex operations into multiple tasks and assign different ‘parts’ of those tasks to different processors or cores – which, in turn, allows complex operations to be run in parallel. In this way, for example, SQL Server can handle a JOIN by letting one thread on one processor do a SELECT against one table, while another thread assembles records from the other table in roughly parallel fashion. Then, with a bit of overhead involved to ‘reassemble’ the data and synchronize the operation, SQL Server can complete the process of pulling back results for a JOIN in significantly less time than it would take if it did the operations serially on a single thread.
Consequently, by assigning tables or indexes to different FILEGROUPs and then placing the files for those FILEGROUPs on different physical drives, RAID arrays, or LUNs, it’s possible to allow SQL Server to engage in parallel I/O operations that can help speed the execution of more complex queries.
Creating new FILEGROUPs in SQL Server is easy – but something you’ll only want to do after you’ve thoroughly researched and tested the potential pros and cons. In terms of benefits, the use of additional FILEGROUPs will give you increased flexibility when it comes to addressing growth (as it’ll be much easier to let a database span multiple partitions, drives, or locations as it grows) and also makes for increased options when it comes to disaster recovery scenarios (for example a 100GB database in a single .mdf file is harder to restore during a disaster as it requires 100GB of free space somewhere – whereas the same database broken up into 2 50GB files can be ‘split’ over different partitions or drives). In terms of drawbacks, the usage of additional FILEGROUPs and multiple files can result in increased complexity that can make management of mission-critical databases just a bit harder to address. Likewise, if too many files are used within a FILEGROUP, write operations can actually be degraded in some extreme cases due to the fact that SQL Server has to ‘round-robin’ through multiple files in order to proportionally fill those files with added data.
Common techniques or approaches for the use of multiple FILEGROUPs and files follow the lines of placing tables on the PRIMARY FILEGROUP, then creating key, non-clustered, indexes on an ‘INDEX’ FILEGROUP – which has files spread across other physical locations. This, in turn, allows SQL Server to engage one set of drives (or spindles) to do bookmarking activities against the table itself, while using a completely different set of drives/spindles to seek or scan against an index – rather than waiting on the same I/O subsystem from a single set of drives/spindles to queue those requests and process them as needed. Likewise, another, common, strategy for effectively leveraging the performance benefits of multiple files/FILEGROUPs is to put heavily used tables (via JOINs) in a ‘SECONDARY’ FILEGROUP – which has files placed on different physical drives/spindles. This, in turn, then allows quicker completion of commonly used JOINs by allowing quicker completion of underlying I/O operations as these operations are handled by different drives or sets of drives.
Multiple Files – In Practice
Bear in mind that while even Books Online recommends using multiple FILEGROUPs as a way to boost performance, realizing these benefits in a production environment can be a bit tricky in some cases. For example, since SQL Server already makes heavy use of parallel operations, merely looking at the execution plan of a complex query that spans multiple FILEGROUPs (with files placed on different spindles/drives) won’t tell you whether you’re achieving a net gain or loss when using multiple files. In fact, the execution plans for multiple-file vs single-file queries will look the exact same in virtually all scenarios. Consequently, to correctly gauge the impact of adding in multiple FILEGROUPs and files, you’ll need to run SQL Server Profiler, make sure to adequately clear your buffers, and run repeated tests to determine if the overall duration of your queries is being decreased.
In other words, while it’s logical to assume that using multiple spindles or drives will result in performance improvements, testing is the only way to be sure that you’re actually obtaining the benefits that you desire. And if you’re not obtaining the desired performance benefits, then you may need to rethink your FILEGROUPs or even abandon the use of multiple files/FILEGROUPs altogether. The key thing to remember though, is that you really shouldn’t make FILEGROUP changes on a production server without thoroughly testing the overall impact of your changes. Especially since the intended benefits may not always be present.
In my experience I’ve seen huge benefits from using multiple FILEGROUPs and files in some environments; in other environments, I just could not obtain any tangible performance benefits from using multiple-files – because of query characteristics, hardware configuration (multiple disks on the same controller or channel, for example, may yield NO benefit), or other considerations. Or, in some cases I’ve seen big improvements for commonly used queries – but at the expense of nightly bulk-insert operations or other critical activities that just couldn’t afford the ‘hit’ incurred.
Consequently, the only real way to know what kinds of benefits you can obtain is to thoroughly test. But, by testing you’ll also be able to gain an appreciation for just how powerful this technique can be when it comes to performance tuning – as I’ve seen it completely ‘fix’ scenarios where organizations were looking to acquire new servers in order to address performance problems when the use of multiple FILEGROUPs and files was all that they needed.