Does Database or Version Matter?

by Aug 26, 2018

Categories

Tags

Administration agent-based monitoring Agentless Monitoring alert responses alert thresholds alerting Alerts Amazon Aurora Amazon EC2 Amazon RDS Amazon RDS / Aurora Amazon RDS for SQL Server Amazon Redshift Amazon S3 Amazon Web Services (AWS) Analytics application monitoring Aqua Data Studio automation availability Azure Azure SQL Database azure sql managed instance Azure VM backup Backup and recovery backup and restore backup compression backup status Backup Strategy backups big data Blocking bug fixes business architecture business data objects business intelligence business process modeling business process models capacity planning change management cloud cloud database cloud database monitoring cloud infrastructure cloud migration cloud providers Cloud Readiness Cloud Services cloud storage cloud virtual machine cloud VM clusters code completion collaboration compliance compliance audit compliance audits compliance manager compliance reporting conference configuration connect to database cpu Cross Platform custom counters Custom Views customer survey customer testimonials Dark Theme dashboards data analysis Data Analytics data architect data architecture data breaches Data Collector data governance data lakes data lineage data management data model data modeler data modeling data models data privacy data protection data security data security measures data sources data visualization data warehouse database database administration database administrator database automation database backup database backups database capacity database changes database community database connection database design database developer database developers database development database diversity Database Engine Tuning Advisor database fragmentation database GUI database IDE database indexes database inventory management database locks database management database migration database monitoring database navigation database optimization database performance Database Permissions database platforms database profiling database queries database recovery database replication database restore database schema database security database support database synchronization database tools database transactions database tuning database-as-a-service databases DB Change Manager DB Optimizer DB PowerStudio DB2 DBA DBaaS DBArtisan dBase DBMS DDL Debugging defragmentation Demo diagnostic manager diagnostics dimensional modeling disaster recovery Download drills embedded database Encryption End-user Experience entity-relationship model ER/Studio ER/Studio Data Architect ER/Studio Enterprise Team Edition events execution plans free tools galera cluster GDPR Getting Started Git GitHub Google Cloud Hadoop Healthcare high availability HIPAA Hive hybrid clouds Hyper-V IDERA IDERA ACE Index Analyzer index optimization infrastructure as a service (IaaS) infrastructure monitoring installation Integrated Development Environment interbase Inventory Manager IT infrastructure Java JD Edwards JSON licensing load test load testing logical data model macOS macros managed cloud database managed cloud databases MariaDB memory memorystorage memoryusage metadata metric baselines metric thresholds Microsoft Azure Microsoft Azure SQL Database Microsoft PowerShell Microsoft SQL Server Microsoft Windows MongoDB monitoring Monitoring Tools Monyog multiple platforms MySQL news newsletter NoSQL Notifications odbc optimization Oracle PeopleSoft performance Performance Dashboards performance metrics performance monitoring performance schema performance tuning personally identifiable information physical data model Platform platform as a service (PaaS) PostgreSQL Precise Precise for Databases Precise for Oracle Precise for SQL Server Precise Management Database (PMDB) product updates Project Migration public clouds Query Analyzer query builder query monitor query optimization query performance Query Store query tool query tuning query-level waits Rapid SQL rdbms real time monitoring Real User Monitoring recovery regulations relational databases Releases Reporting Reports repository Restore reverse engineering Roadmap sample SAP Scalability Security Policy Security Practices server monitoring Server performance server-level waits Service Level Agreement SkySQL slow query SNMP snowflake source control SQL SQL Admin Toolset SQL CM SQL code SQL coding SQL Compliance Manager SQL Defrag Manager sql development SQL Diagnostic Manager SQL Diagnostic Manager for MySQL SQL Diagnostic Manager for SQL Server SQL Diagnostic Manager Pro SQL DM SQL Doctor SQL Enterprise Job Manager SQl IM SQL Inventory Manager SQL Management Suite SQL Monitoring SQL Performance SQL Quality SQL query SQL Query Tuner SQL Safe Backup SQL script SQL Secure SQL Security Suite SQL Server sql server alert SQL Server Migration SQL Server Performance SQL Server Recommendations SQL Server Security SQL statement history SQL tuning SQL Virtual Database sqlmemory sqlserver SQLyog Storage Storage Performance structured data Subversion Support tempdb tempdb data temporal data Tips and Tricks troubleshooting universal data models universal mapping unstructured data Uptime Infrastructure Monitor user experience user permissions Virtual Machine (VM) web services webinar What-if analysis WindowsPowerShell

I’m sure you’ve heard the saying, “Never discuss politics or religion in polite company”. That’s good advice. This blog is not intended to spark impassioned debate on a topic that for many DBAs is their religion: the database they prefer. Rather I simply want to show that maybe automatically picking a database without testing might not always yield the best results. One database cannot really be the single best solution for all use cases. There are considerations for cost, performance, and applications, among others. Therefore some experimentation might be very worthwhile to assess the options. In this post, I will focus on database performance for a data warehouse.

Let me start with the least contentious issue, “Does the database version really matter?” Of course, if you know that you need a certain feature introduced in a specific release, then you simply need to adopt that version. But what if for example any one of three major database versions would work? I know that most database vendors don’t generally support more than the current and previous major versions, but for the sake of demonstration let’s assume that they do. So in this example let’s compare Oracle 11g R2, 12c R2, and 18c performance for a data warehouse. I ran a TPC-H benchmark and measured two key metrics that are fairly easy to compare: how long it takes to:

  • Create, load, index, and collect statistics for 300GB
  • Run 22 highly complex queries that scan lots of data

My goal was to measure two key aspects: how fast data can go in and how fast it can come out. While this may seem like an overly simple way to measure any differences, it works well enough for this blog’s purposes. Here are the results (note that I've excluded the scale units from the Y-axis as I only want to show the relative difference):

Oracle 12c was a major update adding numerous new features; hence the observed 26% improvement was not a surprise. Clearly, that upgrade is a no-brainer. Oracle 18c has been explained by some as being more like an Oracle 12c R2 update (i.e. 12c R3) than a whole new major version. I am not in a position to confirm or deny that, but I do see that 18c yields only a 9% improvement over 12c. Still worth doing, but clearly I could remain on 12c if I preferred. However given all the new cloud-specific features and support in Oracle 18c, I’d probably pull that trigger anyhow.

 

Now let’s try the same test across the four major relational databases: SQL Server, Oracle, PostgreSQL, and MySQL. All databases were running together on the same Windows VM with the same CPU, memory, and disks. Again I am not trying to say that one database is better than another. I merely contend that testing like this will help you to pick the right database for the job at hand. So once again I’m running a 300GB TPC-H for each of these four databases. Here are the results:

I was shocked by these results. My initial belief was that MySQL 8.0 would be a great database for a data warehouse along the lines of a TPC-H type benchmark, and thus probably a good data warehouse option in general. I tried running the tests with both the INNODB and MYISAM storage engines just to be sure. I had to adjust my thinking. For the queries, while still much higher it was still in the ballpark. But the data load, indexing, and statistics collection were orders of magnitude worse. The other databases looked very close, so I had to remove MySQL in order to better see how the remaining three really compared to each other. Here is the comparison with just SQL Server, Oracle, and PostgreSQL.

OK, now we can see that SQL Server and Oracle were the top two choices. Furthermore if data loads were more important, then Oracle might be the better choice; however, if query run times were most important, then SQL Server might be the better choice. My next logical test would be to test SQL Server 2016 and 2017 to see how they stacked up against Oracle 18c before making any final decisions. I leave that up to the reader.

 

I hope you saw the value in this exercise. My premise was that no one single database is always the best choice for any job. You need to compare your alternatives versus the workload you intend to deploy and support. Maybe running industry standard benchmarks will suffice, or maybe you’ll need to do a workload capture and replay instead of your database application. Regardless, please don’t just assume that you automatically know for a fact which database and version are best. In today’s world, the database has become a commodity. You should pick the one that best suits your current project’s needs and budget rather than just always picking the one you have the most history with. With an open-minded approach like this, you’ll master multiple databases and thus extend your career. Knowledge is earning power.