Cloud Day 2—Understanding Your Options in Platform as a Service Databases

by Dec 9, 2020

If you have spent any time reading marketing material from cloud vendors, you will know that there is a big push to move your workloads to platform as a service offerings in the public cloud. There are a number of reasons for this—the biggest, from the cloud providers perspective is that those workloads are stickier—it’s much harder to move on from cloud vendor’s PaaS service, because they each have nuances. 

In some cases, it can be physically challenging to move your data out of a PaaS service, and beyond that the PaaS services can have features that may be difficult to recreate in your on-premises environment. While each of the platform as a service database offerings are different they all include automated, system managed backups, built-in high availability (but not disaster recovery), and monitoring. You should note all of the below services offer both public and private IP endpoints—the private endpoints will require some additional network configuration. 

Amazon RDS for SQL Server

Amazon RDS is probably the most straightforward platform as a service database offering. It is most similar to SQL Server on-premise, or in Azure, Managed Instance. You can backup and restore databases to and from the service, and you can restore databases back to on-premises from the service. You also get to select your version of SQL Server—anything from 2012 al the way to 2019. There are some limitations around distributed transactions, but you will find that across nearly all PaaS SQL Server offerings.  There are a number of features that aren’t supported—the most commonly used being replication, but also resource governor and log shipping.  You can also use your Active Directory for authentication, after you have created a trust between your AD with the AWS AD service. RDS has a limitation of 16 TB for the total for of all of the databases on your instance. 

Azure SQL 

Both Azure SQL Database and Managed Instance fall under the same umbrella, and while they are different services, they share a lot of common features. Both services include General Purpose and Business Critical tiers, which feature different methods of providing high availability and different storage. Business Critical uses local flash storage and has multiple online replicas, while General Purpose uses networked storage, and relies on the underlying storage in Azure to provide high availability. In both services, you can bring on-premises licenses to reduce your overall costs. Both services support Azure Active Directory (AAD) authentication as well as SQL authentication. You may need to change the way your application connects to the database as AAD has a different security model than on-premises Active Directory.

Azure SQL Database

This is the original database platform as a service offering, which started out with the name SQL Azure in 2010. The service has come a long way from there (the original service had a data limitation of 1 GB). Azure SQL Database is generally designed around a single database, but it does have an elastic pool option that allows you to share common resources across a group of databases, which can reduce costs for an application with multiple databases. Azure SQL is always running the latest version of the SQL Server engine, but you can control database compatibility level to ensure your databases perform in a consistent fashion. There are several differences from on-premises SQL Server to Azure SQL Server, but the biggest one is that you cannot perform cross database queries using four-part names (e.g. select * from database2.dbo.customers). Another limitation is that data can only be migrated into the service using bacpac files, which can be very time consuming for larger datasets.

There are some ways to do queries across databases using external tools, or remote data sources, but there is no performance guarantee, so if your application relies on querying multiple databases, you should probably evaluate other options. Some of the other limitations include not being able to use common language runtime (CLR) assemblies, and only allows a database to serve as a replication subscriber (not a publisher). The other major functionality gap is that there is no SQL Server Agent for Azure SQL Database—if you need to run scheduled tasks, you can use Azure Automation, Logic Apps, or Elastic Jobs.

Azure SQL Database is the best fit for new application development, when your application tier is also in the cloud, and your data is newly created. With Azure SQL Database hyperscale, there is virtually no limit on the size of your database.  

Azure SQL Managed Instance

Azure SQL Database was introduced in 2017 and was designed to make it easier to migrate existing SQL Server applications to a platform as a service model. You can execute cross-database queries, the SQL Agent is available, and you can use CLR assemblies. Managed Instance is also inherently in a private network, and on dedicated hardware, which means it can take a lot longer to deploy than other Azure services. You can restore a backup of an existing database to the service, but you can not use the NORECOVERY option for log shipping. In order to perform a real-time migration you would need to use the data migration service provided by Azure.  For the most part the service is fully compatible in T-SQL surface area, but there are a handful of limitations. The other challenge is that you are limited to a max database size of 4 TB for Business Critical and 8 TB for General Purpose.

Azure Synapse Analytics (Formerly Azure SQL Data Warehouse)

Due to space limitations, I’m not going to do a deep dive into Synapse Analytics, however I thought it was important to cover the target use case, as I’ve seen it widely misused. If you have a star schema (facts and dimensions) data warehouse that’s more than a couple of terabytes, then you have a good use case for Synapse Analytics. If you are also integrating that data with Spark Machine Learning, you have an even better case.  If your workload is a busy online transaction processing system, this is not the product for you.

Summary

Choosing a PaaS service isn’t a choice the DBA always gets to make, as sometimes the choice is driven by developers or architects. However, having the knowledge of the different platform offerings empowers you to provide guidance to your other teams before they potentially go down the wrong path, or spend too much on a service that might not meet your requirements. Understanding the nuances between offerings is a key skill for the modern DBA.