If you have ever written SQL statements then you know how flexible it is. Almost always there are several different ways to write a SQL query that returns the same results. For example, you can combine multiple tables using a join or a subselect and achieve the same output… but probably not the same performance. And this is just one example of the various ways you can write SQL to achieve the same results.
Application developers typically do not spend a lot of time on optimizing performance. Their focus is on writing code that matches the project specifications and delivers the expected results, not necessarily testing for performance with a full-production workload. SQL quality and performance testing therefore is an often-disregarded aspect of incorporating database development into your DevOps pipeline. But it is crucial that you analyze and optimize SQL performance within your application code early in the development process; not waiting until it is turned over to production where it can slow down mission-critical work.
We should look at DevOps as an opportunity to integrate SQL performance testing into our continuous delivery pipeline. The best practice approach should be to measure, analyze and improve SQL statements at all stages as your code progresses from development to testing to production. The more SQL performance testing that can be accomplished by developers, the earlier performance problems will be found and corrected. And that means the cost of delivering high-quality database applications will decline.
However, things are not as simple as just running your program and evaluating its performance metrics. The data that you use in your test environment will not be the same as your production data. Typically, your test databases will contain fewer rows than the same database in production. And that means you will get different access paths and performance results in test than in production. Scanning a small table in the test environment usually can be accomplished quite efficiently, but trying to scan the same table with thousands or millions of rows in production without an index will cause performance problems.
Part of SQL performance testing during development is the ability to mimic production scale in test. You can update database statistics to make the DBMS think that the test system has production levels of data. You may need to update test statistics multiple times as production data expands. The goal is for the test environment to always be set up to look as much like production as possible.
Keep in mind that it is not just the amount of data that differs between test and production systems. It is not uncommon for the test environment to utilize different hardware, with different system parameters and configurations than are used for production. Your test system will usually be running on a less powerful processor and using less memory. This can result in different access paths for test and production systems, which can cause performance problems that only show up after you move to production.
Some database systems and management products allow you to model the configuration and setting of your production environment in your test system, without actually deploying the same exact setup. For example, making the optimizer think there is more memory configured to the cache or buffer pools. This can be set up as part of the SQL performance testing in your DevOps pipeline to enable your SQL tests to be as close as possible to the production environment.
Being sure to always review SQL access plans before any code is moved to production can help to eliminate operational performance surprises. Saving and comparing access plans over time can show if, when, and how SQL performance is either improving or degrading.
With the proper setup and tooling, developers can examine the access paths of their SQL statements to judge their efficiency. This means incorporating performance testing scripts and procedures into the development pipeline in the same way that other, more traditional testing procedures are incorporated.
To properly achieve this goal, consider implementing SQL performance and testing tools. There are tools that can automate the review and comparison of SQL access plans making them ideal for DevOps. You can set them up to be part of the CI/CD pipeline such that any changed access plan is examined and flagged for further review if it will degrade performance.
You should look for flexible tools that can be configured for testing specific use cases. For example, skewed data can negatively impact SQL performance. When data is non-uniformly distributed a subset of the values occurs much more frequently than others. A special case of non-uniformly distributed data is skewed data. When data is skewed, one value (or a very small number of values) occurs much more frequently than others. Testing such cases is important to avoid performance surprises in production systems.
Furthermore, you will need a tool that supports all of the different types of SQL that you will be delivering. It should support testing both static SQL and dynamic SQL. It should be capable of testing procedural SQL used to write stored procedures and user-defined functions. Any type of and flavor of SQL that you are using should be testable and with performance results that can be compared across multiple test runs.
Furthermore, the tool should be capable of documenting the performance results achieved, and perhaps even suggesting multiple SQL variations to improve the results.