Databases Don’t Build Themselves

by May 24, 2018

Categories

Tags

Administration agent-based monitoring Agentless Monitoring alert responses alert thresholds alerting Alerts Amazon Aurora Amazon EC2 Amazon RDS Amazon RDS / Aurora Amazon RDS for SQL Server Amazon Redshift Amazon S3 Amazon Web Services (AWS) Analytics application monitoring Aqua Data Studio automation availability Azure Azure SQL Database azure sql managed instance Azure VM backup Backup and recovery backup and restore backup compression backup status Backup Strategy backups big data Blocking bug fixes business architecture business data objects business intelligence business process modeling business process models capacity planning change management cloud cloud database cloud database monitoring cloud infrastructure cloud migration cloud providers Cloud Readiness Cloud Services cloud storage cloud virtual machine cloud VM clusters code completion collaboration compliance compliance audit compliance audits compliance manager compliance reporting conference configuration connect to database cpu Cross Platform custom counters Custom Views customer survey customer testimonials Dark Theme dashboards data analysis Data Analytics data architect data architecture data breaches Data Collector data governance data lakes data lineage data management data model data modeler data modeling data models data privacy data protection data security data security measures data sources data visualization data warehouse database database administration database administrator database automation database backup database backups database capacity database changes database community database connection database design database developer database developers database development database diversity Database Engine Tuning Advisor database fragmentation database GUI database IDE database indexes database inventory management database locks database management database migration database monitoring database navigation database optimization database performance Database Permissions database platforms database profiling database queries database recovery database replication database restore database schema database security database support database synchronization database tools database transactions database tuning database-as-a-service databases DB Change Manager DB Optimizer DB PowerStudio DB2 DBA DBaaS DBArtisan dBase DBMS DDL Debugging defragmentation Demo diagnostic manager diagnostics dimensional modeling disaster recovery Download drills embedded database Encryption End-user Experience entity-relationship model ER/Studio ER/Studio Data Architect ER/Studio Enterprise Team Edition events execution plans free tools galera cluster GDPR Getting Started Git GitHub Google Cloud Hadoop Healthcare high availability HIPAA Hive hybrid clouds Hyper-V IDERA IDERA ACE Index Analyzer index optimization infrastructure as a service (IaaS) infrastructure monitoring installation Integrated Development Environment interbase Inventory Manager IT infrastructure Java JD Edwards JSON licensing load test load testing logical data model macOS macros managed cloud database managed cloud databases MariaDB memory memorystorage memoryusage metadata metric baselines metric thresholds Microsoft Azure Microsoft Azure SQL Database Microsoft PowerShell Microsoft SQL Server Microsoft Windows MongoDB monitoring Monitoring Tools Monyog multiple platforms MySQL news newsletter NoSQL Notifications odbc optimization Oracle PeopleSoft performance Performance Dashboards performance metrics performance monitoring performance schema performance tuning personally identifiable information physical data model Platform platform as a service (PaaS) PostgreSQL Precise Precise for Databases Precise for Oracle Precise for SQL Server Precise Management Database (PMDB) product updates Project Migration public clouds Query Analyzer query builder query monitor query optimization query performance Query Store query tool query tuning query-level waits Rapid SQL rdbms real time monitoring Real User Monitoring recovery regulations relational databases Releases Reporting Reports repository Restore reverse engineering Roadmap sample SAP Scalability Security Policy Security Practices server monitoring Server performance server-level waits Service Level Agreement SkySQL slow query SNMP snowflake source control SQL SQL Admin Toolset SQL CM SQL code SQL coding SQL Compliance Manager SQL Defrag Manager sql development SQL Diagnostic Manager SQL Diagnostic Manager for MySQL SQL Diagnostic Manager for SQL Server SQL Diagnostic Manager Pro SQL DM SQL Doctor SQL Enterprise Job Manager SQl IM SQL Inventory Manager SQL Management Suite SQL Monitoring SQL Performance SQL Quality SQL query SQL Query Tuner SQL Safe Backup SQL script SQL Secure SQL Security Suite SQL Server sql server alert SQL Server Migration SQL Server Performance SQL Server Recommendations SQL Server Security SQL statement history SQL tuning SQL Virtual Database sqlmemory sqlserver SQLyog Storage Storage Performance structured data Subversion Support tempdb tempdb data temporal data Tips and Tricks troubleshooting universal data models universal mapping unstructured data Uptime Infrastructure Monitor user experience user permissions Virtual Machine (VM) web services webinar What-if analysis WindowsPowerShell

Last week I was fortunate to be asked to present during the IDERA Live Event. There were many great presentations and those who attended received a lot of good information. I wanted to post a summary of my presentation here for those that were unable to attend.

In the blog, we are going to talk about how databases come to be. We will discuss the importance of data modeling in the design process, key concepts around design, data loading and finally some key elements of effective data management.

What is Data Modeling?

Unfortunately a lot of databases start as a gleam in someone's eye. Maybe consisting of a couple of tables needed to store some data for an application front end or perhaps generating reports or analytics. Soon a database is born and begins to grow as more data is added. It's somewhat like the big bang, from a small particle the universe is created. 

Unfortunately over time we get to a point at which we are left with a pile of stuff.

Legos are great, but unless we build something useful or cool with them, it's just a pile of Legos. What we really want as a database is something like this:

It's fast sleek and really cool. This is what data modeling can do to help us design and build efficient, highly performing and relevant databases. The key benefits of data modeling are:

  • Help build higher quality applications
  • Reduce cost through design before coding
  • Build software faster
  • Determines scope of a project based on data needs
  • Better performance of applications, data retrieval and analytics

If you want to sum this up in one thought, it's that data modeling does the work up front so you don't have to fix a mess at the end. In the next section we will talk about basic data modeling concepts.

Conceptual Model

A conceptual model is a high-level description of a business's informational needs. It typically includes only the main concepts and the main relationships among them.

Specifically, it describes the things of significance to an organization (entity classes), about which it is inclined to collect information, and characteristics of (attributes) and associations between pairs of those things of significance (relationships).

This type of Data Model is designed and developed for a business audience. The conceptual model is developed independently of hardware specifications like data storage capacity, location or software specifications like DBMS vendor and technology. The focus is to represent data as a user will see it in the "real world."

Logical Model

Logical data models should be based on the structures identified in a conceptual data model since this describes the semantics of the information context, which the logical model should also reflect. Even so, since the logical data model anticipates implementation on a specific computing system, the content of the logical data model is adjusted to achieve certain efficiencies.

As you can see here, the model includes all entities and relationships among them. All attributes for each entity are specified. The primary key for each entity is specified along with any Foreign keys (keys identifying the relationship between different entities). Normalization typically occurs at this level (for relational models).

Physical Model

A physical data model is a representation of a data design as implemented, or intended to be implemented, in a database management system. In the life cycle of a project it typically derives from a logical data model, though it may be reverse-engineered from a given database implementation.

A complete physical data model will include all the database artifacts required to create relationships between tables or to achieve performance goals, such as indexes, constraint definitions, linking tables, partitioned tables or clusters. Analysts can usually use a physical data model to calculate storage estimates; it may include specific storage allocation details for a given database system.

Database Deployment

Deploying the database typically involves building the database from scratch on the platform to be deployed to. Deployment is usually built on the concept of "forward engineering". You can take the physical model in most data modeling tools and deploy/create the database.

To deploy, you can either use direct connection to the database platform or create “scripts” that can then be run to create the database.

New transactional databases (OLTP) typically have front end applications that are feeding data into the database. For databases built around the concept of Data Warehouse/Data Lake we typically find either:

  1. Extract, Transform, Load (ETL) which is normally a continuous, ongoing process with a well-defined workflow. ETL first extracts data from homogeneous or heterogeneous data sources. Then, data is cleansed, enriched, transformed, and stored either back in the lake or in a data warehouse.
  2. Extract, Load, Transform (ELT) is a variant of ETL wherein the extracted data is first loaded into the target system. Transformations are performed after the data is loaded into the data warehouse.

ELT typically works well when the target system is powerful enough to handle transformations. Analytical databases like Amazon Redshift and Google BigQuery are often used in ELT pipelines because they are highly efficient in performing transformations.

Typically, however, data is temporarily stored in at least one set of staging tables as part of the process.

The extract step might copy data to tables as quickly as possible in order to minimize time spent querying the source system, or the transform step might place data in tables that are copies of the warehouse tables for ease of loading (to enable partition swapping, for example), but the general process stays the same despite these variations.

Which is better depends on priorities. All things being equal, it’s better to have fewer moving parts. ELT has no transformation engine – the work is done by the target system, which is already there and probably being used for other development work.

On the other hand, the ETL approach can provide drastically better performance in certain scenarios. The training and development costs of ETL need to be weighed against the need for better performance. (Additionally, if you don’t have a target system powerful enough for ELT, ETL may be more economical.)

Some general guidelines on data loading:

Extract, Transform, Load (ETL)

Extract, Load, Transform (ELT)

Bulk data movement

When ingestion speed is king

Complex rules and transformations are needed.

When more intel is better intel

Combine large amounts of data from multiple sources

When you know you will need to scale

Finally

To sum things up, building, deploying and loading efficient and effective data platforms revolves around planning and modeling. If these things are done up front with the inclusion of the organization in the planning stages then we save a lot of headache and cost. I will leave you with this quote which pretty much says it all:

 
“If you do not know where you come from, then you don't know where you are, and if you don't know where you are, then you don't know where you're going. And if you don't know where you're going, you're probably going wrong.”