Managing Mountains of Data with the Cloud

by May 4, 2018

When is data less valuable? How do you manage data as it becomes less valuable?

In the good old days the answer to these questions was easy; it was simply based on what the company was willing to pay for relatively expensive onsite disk space. So while a company might prefer to keep data around as long as possible, the realty was always more of what is the bare minimum the company could afford to keep and was willing to pay for space to house that data. Back then this was not a major tradeoff because we did not have data scientists or data analytics plus business intelligence was in its relative infancy. Below is a simple diagram of a data life cycle management schema that smart DBAs might have implemented. That was then.

Nowadays the businesses see data as a leading differentiator and worth its lifting training in gold. Hence they want to keep as much data accessible online as possible, within reasonable cost. With much lower disk costs these days plus tiered cloud storage options, the amount of data practically possible to maintain online is arguably near unlimited. Of course that’s not a very legitimate answer since we also need to maintain certain performance service level agreements (SLAs). But today a DBA might have to maintain four to ten times as much historical data to meet the new data driven business requirements, including regulatory and compliance needs. Undoubtedly the data life cycle management picture above is not the optimal solution.  For example on Amazon Cloud you might implement a data tiering strategy something along these lines. Note how many more options DBAs now have today.