Industry leading organizations recognize and manage data as a strategic asset. By ensuring high data quality, they are able to rely upon data for critical decision making.
Owing to the value of data to the modern enterprise, business intelligence and analytics spending has been increasing dramatically for several years.
It often incorporates traditional data warehouse platforms, as well as data lakes comprised of SQL and NoSQL technologies, dispersed across on-premise and cloud environments.
Considering the value of data to modern enterprises, fostering a strong data culture provides organizations with a competitive edge. But to enjoy such a competitive edge, organizations must focus on and implement the following:
Improving and Maintaining Data Quality
Incorrect decisions based on poor data can be disastrous, so how can we ensure that we are utilizing the proper data to begin with? In order to do so, we must be able to address the following data quality considerations:
-
Is the data accurate?
-
Is the data timely?
-
Is the data complete?
-
Is the data consistent?
-
Is the data relevant to the decision?
-
Is the data fit for use?
This is compounded further by increasing complexity in the data ecosystem that every organization operates within. Most corporations have a variety of software applications and data stores scattered across multiple heterogeneous platforms, utilizing a spider web of point-to-point interfaces to move data back and forth.
When we examine most data environments, we find that many of the ETL processes usually incorporate at least some degree of data to render the data usable at the point of consumption.
However, this can be quite risky if we do not truly understand the data and the changes that have occurred on its journey through the organization’s systems.
This is analogous to the problems that occurred in manufacturing production lines prior to the early 1980’s: complex products were built from thousands of parts and sub-assemblies, then inspected for quality conformance after they rolled off the assembly line. Inspection does not improve the product.
It simply identifies the defects that need to be addressed. Defective items were scrapped or reworked at significant cost, but the origin of the defects often went undetected. Thus, the problems ensued repetitively.
To address this, the quality movement of the 1980’s focused on many aspects, a few of which are stated here because they are very relevant to data in the context of this discussion:
-
Validation of the inputs to every discrete process, preventing usage of defective components
-
Trace-ability of components and sub-assemblies within finished goods to point of origin
-
Empowerment of front line workers to address problems, even if it meant halting the entire production line
-
Continuous improvement of all processes
Introduce Collaborative Data Governance
In order to succeed, a collaborative culture must be established with a commitment to data quality, from senior executives through to the front line workers that create and modify data on a daily basis.
If data originates outside the organization, it must be validated prior to use. Data governance and stewardship must be established so that responsibilities are clearly understood and agreed to by all parties.
Understanding the Data Ecosystem through Data Modeling
This allows data to be understood in context, and is the basis to identify redundancy and inconsistency. All manifestations of each critical business data object must be identified and cataloged.
Typically the most critical business data objects are also master data, as they are utilized in most transactions (for example: customer, product, location, employee, etc.).
Without context, it is extremely difficult to ensure that the proper data is being utilized for reporting and analytical purposes, and hence, informed decision making. In order to complete the understanding, the models must be supported by integrated business glossaries that are owned by the business stakeholders responsible for each area.
Business analysts, data analysts, modelers and architects build the required conceptual and logical models based on continual consultation with business stakeholders.
Physical data models are used to describe the underlying systems implementations, including data lineage. When combined with data flows, true enterprise data lineage can be understood and documented. This is the point at which we have established true trace-ability, which is vital for comprehension and knowledge.
All of the models, metadata and glossaries must be integrated through a common repository to enable true collaboration and understanding.
Approved artifacts need to be published in a medium that is easily consumed, typically through a web-based user interface. In addition, the models themselves become the means to analyze, design, evaluate and implement changes going forward.
Due to the size and complexity of most environments, this must be done on a prioritized basis, starting with the most critical business data objects.
Metrics are established to quantify relative importance as well as to evaluate progress. As with any continuous improvement initiative, breadth and depth are increased incrementally.