Initialize Transactional Replication from your Backups

by May 17, 2023

This blog provides guidance on the perceived limitations and restrictions on initializing transactional replication subscribers from backups

The Trouble with Snapshot Initialization

This blog does not mean that initialization from a snapshot should not be used. Snapshot initialization is very useful. It benefits small databases and remains the simplest way to set up transactional replication. For large databases or datasets, snapshot creation can cause a performance impact on the publication database.

Snapshot replication does not take advantage of native SQL Server compression. Snapshots consist of straight BCP (bulk copy program) data from the publication to the subscriber. If the dataset is large, the data files will, in turn, be large as well. Snapshot generation can be lengthy, as can the time required to apply the snapshot to the subscriber. 

A compression option for snapshots uses Windows CAB compression to compress the data files after they have been created. This option may help speed up the snapshot delivery, especially when there is limited network bandwidth between the publisher and subscriber, but several limitations exist. This process can noticeably affect CPU usage during the compression phase. Snapshot compression should not be used if the server already uses high CPU levels. The operating system also limits it to files no larger than 2 GB. If a data file larger than 2 GB is created, the process will fail when it tries to compress it. The snapshot process is good about breaking large tables up into 2 GB or smaller files, but occasionally, you hit a slightly larger one. This option is not supported via the replication wizard. Still, it can be configured using T-SQL when creating a publication or the properties dialog when altering a publication on the snapshot options tab.

In addition to the @compressed_snapshot parameter, you must provide an alternate snapshot location via the @alt_snapshot_folder parameter. A snapshot in the default location cannot be compressed, and only the snapshot generated in the alternate location will be compressed. Unlike backup compression, snapshot compression will make the snapshot generation run longer, not shorter. Compressed snapshots are created uncompressed in the primary snapshot location and then compressed to the alternate snapshot folder. Yes, that means with compressed snapshots, you get two copies of the snapshot; one compressed, one not compressed.

Which Backups are Suitable for Initialization

Initialization from backup is most beneficial with extensive databases (VLDBs) as it enables the replication administrator to deliver a large amount of data to a subscriber with minimal impact on the publisher and takes advantage of other technologies like backup compression. Yet the process was not widely adopted when released with SQL Server 2005 due to the lack of native backup compression in SQL Server.

To initialize a subscriber from a backup, SQL Server must be able to read directly from the backup file natively. This precluded using a backup that was compressed by 3rd party utilities. In SQL Server 2008 and newer, the replication engine can read natively compressed backups, so this limitation is no longer a factor. Furthermore, there is no requirement to use a full backup as the initialization source for a subscriber. Hence, this limitation was a partial blocker for initialization from backup in SQL Server.

This indicates that we can initialize a subscriber from an alternate type of backup, such as a differential or log backup. If relying on third-party backup utilities, you can still use the third-party tool for the full backup and then use a much smaller backup, such as a log backup, as the initialization source for the subscriber.

The process to initialize a subscriber from a log backup would be similar to the following:
  1. Create a full backup of the publisher.
    1. It can be compressed via a third-party tool, and sometimes this is still preferred as modern compression tools can achieve a higher compression ratio.
    2. It can be natively compressed or encrypted.
    3. If the recovery model is FULL, take a log backup as part of your standard processes to minimize the log backup size used to initialize the subscriber.
  2. Restore the full backup on the subscriber.
    1. Use the NORECOVERY option so further backups can be restored.
  3. Configure the replication publication.
  4. Allow initialization from backup.
    1. Any backup created after this option is enabled can initialize a subscriber.
  5. Take a non-full backup of the publisher.
    1. A log backup would be appropriate if the recovery model is FULL.
    2. If the recovery model is SIMPLE, then use a differential backup. While the most recent full backup will work, taking and using a new full backup will minimize the size of the differential backup.
  6. Restore the non-full backup on the subscriber.
    1. Use the RECOVERY option to bring the subscriber database online.
  7. Configure the subscription
    1. Specify the log file or differential backups used in step 6 as the initialization source.

The replication wizard does not support initializing a subscriber from the backup. If using this option, you must create the subscriber using T-SQL. The code snippet below exemplifies initialing a push subscription from a backup. These commands would be executed in the publisher database. 

— Add a subscription specifying the backup to use for initializing
— Modify the below to use your backup location and server/ instance
  • EXEC sys.sp_addsubscription
  • @publication = N’<Name of publication>’,
  • @subscriber = N’<Subscriber server\instance>’,
  • @destination_db = N’<Subscriber database name>’,
  • @subscription_type = N’Push’,
  • @sync_type = N’initialize with backup’,
  • @backupdevicetype = N’Disk’,
  • @backupdevicename = N’<Path and name of backup file used for initialization>’,
  • @update_mode = N’read only’,
  • @subscriber_type = 0; 
— Add subscription agent
— Modify the below to use your server/instance name 
  • EXEC sys.sp_addpushsubscription_agent
  • @publication = N’<Name of publication>’,
  • @subscriber = N’<Subscriber server\instance>’,
  • @subscriber_db = N’<Subscriber database name>’;

Beyond Simple Initialization from Backups 

One of the other reasons some administrators have yet to adopt initialization from backup is that the full backup will contain the entire database. This generally means that the DBA must manually drop any unwanted tables or via a script. If the database is vast, this could result in transferring more data than is required and a lot more work by the administrator. This spawned a discussion of alternate methods.

Books Online says that any backup can be used to initialize a subscriber as long as it was created after initialization from the backup was enabled. This implies that a filegroup backup could be used. The bigger question is, can a filegroup backup be used without restoring a full backup? 

A complete backup must first be restored before using any other type of backup to initialize a subscriber. However, we could make initialization from a filegroup backup without transferring a full backup to the subscriber work with some caveats. The below limitations apply to using this process.

> The filegroup restore must include the primary filegroup as it contains all system objects and definitions of all user objects 
> Relies on partial database availability, which is an Enterprise-only feature 
> All filegroups being restored must have the same base differential LSN 

Back up the filegroups together 

To take full advantage of this process, we back up just the filegroups that we want to restore in their own backup. This is not a requirement. We can partially restore select filegroups from a full or a partial backup containing additional filegroups. For example, if we already had a full backup created, we could use the full backup for the restore and specify which filegroups to restore.

Initializing from a Filegroup Backup

The outlined process reduces the burden of backing up, copying, and restoring the entire database. Even though the process will work with a full backup, the following describes the process using only filegroup backups. 

Creating the Filegroup Backups

There are two methods that you can use for creating the filegroup backup. If a portion of your database is read-only, you can back up the read-only filegroups in a separate process and the read-write filegroups together. Assuming that all of the filegroups that contain tables to be replicated are read-write filegroups, you can then transfer just that backup and restore the read-write filegroups only. You can also specify filegroups by name for the restore process if the read-write filegroup backup contains additional filegroups.

With this method, the read-only filegroup must be backed up by name to back up separately, but you can specify the read-write filegroups only by including the READ_WRITE_FILEGROUPS parameter for the BACKUP command. The syntax of this backup command would be:

  • BACKUP DATABASE <Database Name>
  • READ_WRITE_FILEGROUPS
  • TO DISK = ‘<Path to and name of backup file>’
  • WITH INIT;

Alternatively, you can specify the filegroups by name for the backup command. The syntax of the backup command is similar to backing up read-write filegroups, except you identify the filegroups by name separated by commas if more than one. For example, this is the syntax for the backup command if backing up two filegroups (remember that it must include the primary filegroup):

  • BACKUP DATABASE <Database Name>
  • FILEGROUP = ‘primary’,
  • FILEGROUP = ‘<Filegroup name>’
  • TO DISK = ‘<Path to and name of backup file>’
  • WITH INIT;

Separation of Filegroups

This process does rely on the separation of tables by filegroups. Most often, this is different from the state of a database. Taking advantage of this process may mean taking one-time downtime to move some of the tables to alternate filegroups. This complex process for an extensive database may need to be performed piecewise over a long time. To optimize the database for initializing replication from a backup, it is best to move all tables that will be replicated into their filegroup(s). A simple filegroup setup might look like this: 

  • Primary
    • One best practice suggests creating all user-defined objects in FGs other than primary and keeping it only for the system objects.
  • ReplicatedRW
    • A read/write filegroup that contains only the writeable tables that will be replicated.
  • ReplicatedRO
    • A read-only filegroup that contains only the read-only tables that will be restored to the replication subscriber
  • NonReplicatedRW
    • A read/write filegroup that contains only the writeable tables that will NOT be replicated.
    • All code would go in this filegroup.
  • NonReplicatedRO 
    • A read-only filegroup that contains only the read-only tables that will NOT be restored to the replication subscriber.

Only those needed and additional filegroups could be created for further separation (partitioning, for example). Using a layout like this will not only require the least amount of data to be transferred but will eliminate the need to delete unnecessary objects from the replicated database.

Restoring the Filegroup Backups

If you are restoring specific filegroups from a full backup or a backup with additional filegroups, you must identify the filegroups you want to be restored. This may mean using the READ_WRITE_FILEGROUPS option or specifying filegroups by name. Either way, you can use the same parameter in the RESTORE command you used for the backup command. However, the parameters are optional if the backup only includes the filegroups you want to restore. If you do not specify which filegroups to restore, the restore process will restore everything included in the backup. All file groups in the backup will be restored unless you specify otherwise.

The following are sample restore commands demonstrating the restoration of specific filegroups:

  • RESTORE DATABASE <Database Name>
  • READ_WRITE_FILEGROUPS
  • FROM DISK = ‘<Path to and name of backup file>’
  • WITH RECOVERY;
  • RESTORE DATABASE <Database Name>
  • FILEGROUP = ‘primary’,
  • FILEGROUP = ‘<Filegroup name>’
  • FROM DISK = ‘<Path to and name of backup file>’
  • WITH RECOVERY;

Verifying Filegroup State

After the restore is complete, the database will be online. Any files in those filegroups that were not restored will have a state of Recovery Pending. You can check the state of the files and filegroups by querying sys.database_files and sys.data_spaces inside the database you restored with the following query:

  • SELECT DF.name AS [File Name],
  • DF.type_desc AS [File Type],
  • DF.state_desc AS [File State],
  • DF.size AS [File Size],
  • DS.name AS [FileGroup Name]
  • FROM sys.database_files DF
  • LEFT JOIN sys.data_spaces DS
  • ON DS.data_space_id = DF.data_space_id

Step-by-Step Process

Initializing a subscriber from a backup file is similar to initializing it from a log backup. The significant differences are that you don’t perform a full backup and restore first, and you specify the filegroups involved. We start at step 3 as follows:

  1. Configure the replication publication.
  2. Allow initialization from backup.
    1. Any backup created after this option is enabled can initialize a subscriber.
  3. Take a backup of the filegroups on the publisher that you want to replicate.
  4. Restore the filegroup backup on the subscriber.
    1. Use the RECOVERY option to bring the subscriber database online. 
  5. Configure the subscription. 
    1. Specify the log file backup used in step 6 as the initialization source.

Additional Database Backup Resources:

If you are interested in learning about database backup strategies a few moments to review this whitepaper “Understanding Database Backup Strategies” to learn more about planning your database backups for the best results.

Idera provides robust solutions for SQL Server, Azure SQL Database, and Amazon RDS for SQL Server: