Seven Deadly Sins of Database Design

by Joy RuffJul 31, 2014

If you don’t get the data right, nothing else matters. However, the business focus on applications often overshadows the priority for a well-organized database design. Addressing some simple data modeling and design fundamentals can put you on the right path. Here are seven common database design “sins” that can be easily avoided and ways to correct them in future projects.

Sin #1: Poor or missing documentation for database(s) in production

Documentation for databases usually falls into three categories: incomplete, inaccurate, or none at all. This causes developers, DBAs, architects, and business analysts to scramble to get on the same page. They are left up to their own imagination to interpret the meaning and usage of the data.

The best approach is to place the data models into a central repository and spawn automated reports so that with minimal effort, everyone benefits. Producing a central store of models is only half the battle, though. Once that is done, executing validation and quality metrics will enhance the quality of the models over time. As your level of data management increases, you can extend what metadata is captured in the models.

Sin #2: Little or no normalization

There are times to denormalize a database structure to achieve optimized performance, but sacrificing flexibility will paint you in a corner. Despite the long-held belief by developers, one table to store everything is not always optimal. Another common mistake is repeating values stored in a table. This can greatly decrease flexibility and increase difficulty when updating the data.

Understanding even the basics of normalization adds flexibility to a design while reducing redundant data. The first three levels of normalization are usually sufficient for most cases:

· First Normal Form: Eliminate duplicate columns and repeating values in columns

· Second Normal Form: Remove redundant data that apply to multiple columns

· Third Normal Form: Each column of a table should be dependent on the primary identifier

Sin #3: Not treating the data model like a living, breathing organism

There are numerous examples of customers performing the modeling up front, but once the design is in production, all modeling ceases.

To maintain flexibility and ensure consistency when the database changes, those modifications need to find their way back to the model.

Sin #4: Improper storage of reference data

There are two main problems with reference data. It is either stored in many places or, even worse, embedded in the application code.

Reference values provide valuable documentation which should be communicated in an appropriate location. Your best chance is often via the model. The key is to have it defined in one place and used in other places.

Sin #5: Not using foreign keys or check constraints

Customers complain all the time about the lack of referential integrity (RI) or validation checks defined in the database when reverse engineering databases. For older database systems, it was thought that foreign keys and check constraints slowed performance, thus, the RI and checks should be done via the application.

If it is possible to validate the data in the database, you should do it there. Error handling will be drastically simplified and data quality will increase as a result.

Sin #6: Not using domains and naming standards

Domains and naming standards are probably two of the most important things you can incorporate into your modeling practices. Domains allow you to create reusable attributes so that the same attributes are not created in different places with different properties. Naming standards allow you to clearly identify those attributes consistently.

Sin #7: Not choosing primary keys properly

The simplest principle to remember when picking a primary key is SUM: Static, Unique, Minimal. It is not necessary to delve into the whole natural vs. surrogate key debate; however, it is important to know that although surrogate keys may uniquely identify the record, they do not always uniquely identify the data. There is a time and a place for both, and you can always create an alternate key for natural keys if a surrogate is used as the primary key.

Sin-free databases

Hopefully this information has helped you to evaluate your data management habits and assess your current database structure. It may be a multi-step process to define your strategy to eliminate all of these issues, but it is important to have a plan to get there. Embarcadero can help you to address these concerns; contact us to learn more. You can also read the white paper on this topic: http://forms.embarcadero.com/7_deadlysins_datamodeling.

Seven Deadly Sins of Database Design

Categories