Normalization and Data Modeling

by Oct 20, 2021

Normalization comes in a number of different forms including first normal form (1NF), second normal form (2NF), and third normal form (3NF).

In the database world, normalization refers to the process of eliminating data redundancy and ensuring information is stored logically. Normalization involves a systematic and multi-step approach and is one of the foundations of relational database theory. Databases that are effectively normalized return unambiguous and expected results when queried. 

What is Normalization?

Normalization is the process of evaluating database schema through multiple sets to reduce redundancy and eliminate update, deletion, and insertion anomalies. 

The goal of normalization is to ensure the information in relational databases is stored efficiently, making it available to consistently satisfy user queries. 

It was introduced to the computing community by Edgar Cobb in a 1970 paper titled “A Relational Model of Data for Large Shared Data Banks.” 

When this paper made its appearance, database theory and the concept of relational databases were in their infancy. 

Multiple Normal Forms

Normalizing a database schema is an iterative process that moves methodically through various stages of refinement. 

Putting the schema in normal form is done to minimize and hopefully eliminate modification anomalies that negatively impact data consistency and database performance. 

The process of normalization is enacted at the table level.

Let’s look at how the different normal forms are defined and what needs to be done to successfully move through each stage of normalization:

First normal form

To be considered in the first normal form (1NF), a table must demonstrate the following characteristics. 

  • It must be two-dimensional and contain rows and columns.
  • Each row must contain data related to an item or part of an item.
  • Each column stores data for a single attribute of the given item.
  • Every intersection of a row and column must have only a single value.
  • All entries in a column must be of the same type of data, for instance, an identifying employee number.
  • Columns need to have unique names.
  • Each row must be unique with no two rows containing identical information.
  • Column and row order are not significant. 

A table in first normal form is not immune to all modification anomalies. Leaving a database in first normal form may only be appropriate for a subset of applications. 

Second normal form

Functional dependency is the concept underlying the second normal form (2NF). 

An attribute is functionally dependent on another attribute if the value of the second attribute allows you to determine the value of the first attribute. 

All tables in first normal form are required to have a unique primary key which can be one or more than one column. Keys that consist of more than one column are known as composite keys.

For a table to be in 2NF, all non-key attributes depend on the whole key. If a reaction has a single attribute key and is already in 1NF, it is also in 2NF. 

A table in 2NF is at risk from modification anomalies related to transitive dependencies. This type of dependency occurs when an attribute depends on a second attribute which then depends on a third attribute. Deletions in a table with this kind of dependency can result in lost information.

Third normal form 

To achieve the third normal form (3NF) status, a relation is required to be in second normal form and have no transitive dependencies. 

In a 3NF-compliant table, no non-primary key attribute has transitively dependent relationships to the primary key. 

Having the table in 3NF ensures that its fields will not have anomalies based on inserts, updates, and deletions.

Essentially, attaining 3NF requires a schema to be engineered correctly so that all values have independence. 

This means that procedural changes don’t cause problems with other data components. The parts of a record are independent, eliminating the possibility of unintended results when changes are made.

Data Modeling and Normalization

The process of data modeling can be used to facilitate the normalization of data resources. Iteratively modeling data assets and normalizing the eventual database schema are related activities that contribute to an organization’s ability to use data assets productively.

The ER/Studio family of data modeling tools can serve as a valuable solution for normalizing data resources to create more efficient and consistent applications. Three related applications address different aspects of data modeling to create conceptual, logical, and physical data models.

ER/Studio Business Architect

Business Architect maps the relationships between the people, processes, and data that need to be understood to create conceptual data models. These models reflect business objectives and produce high-level representations of a desired database or application. 

Try ER/Studio Business Architect for Free!

ER/Studio Data Architect

ER/Studio Data Architect lets teams build out enterprise data models, discover and document data assets, and create data catalogs. Data Architect builds on the conceptual models developed with Business Architect to produce logical and physical models that can be implemented in database applications. 

Try ER/Studio Data Architect for Free!

ER/Studio Enterprise Edition

ER/Studio Enterprise Edition is a collaborative tool that enables an organization to share data models and build a common data vocabulary necessary for efficiently using enterprise data resources. A shared repository supports model consistency and keeps all stakeholders informed of changes.

Request a free demo of ER/Studio Enterprise Edition!

Organizations developing data models will benefit from the features available in ER/Studio. The tools will help companies maintain data consistency and extract the maximum value from their data assets.