We live in an exciting time where technology is entering our lives in areas that many of us never expected. For example, at home, once unintelligent devices such as thermostats, light bulbs, and entertainment consoles are now capable of detecting when a person is within close proximity and can adjust to his or her preferences. Within the workplace, computers, peripheral systems, and machinery are constantly communicating with one another to create organizational efficiencies. And the all-pervasive smartphone, which had found its way into the pocket of nearly a billion individuals by the end of 2013, is constantly capturing and transmitting information about the world around it. All of this new technology has not just changed the way we operate on a day to day basis, but it has immensely increased the volume, velocity, and variety of data being created. A phenomenon known as the 3Vs and a trend that will only increase over time as the “internet of things” becomes more prevalent.
Faced with this growing trend, data professionals now often have to look beyond the relational database to NoSQL database technologies to fully address their data management needs. The four categories of NoSQL databases are column-oriented, key-value, graph, and document-oriented databases, and each one is best suited to fill a specific data management niche. For instance, graph databases are excellent for discovering nearest neighbors and relationships between people in a network, and therefore are often leveraged for supporting social applications.
Document-oriented databases, which store data in JavaScript Object Notation (JSON) documents, are commonly used to collect machine generated data. This data is often generated rapidly and in high volume, and therefore is often difficult to manage and analyze with systems other than document-oriented databases.
MongoDB is a specific example of a document-oriented database and, although it excels at solving the problem outlined above, its schema-less architecture can pose problems related to data quality. For instance, in a single MongoDB collection, you could technically have different fields for accountNumber, AccountNumber, and accountnumber (note the different case usage). The data quality problem arises when a report is written that requires customer account numbers, but only queries the AccountNumber field and, therefore, only receives a subset of records. To steer clear of this problem, it is recommended that time be spent modeling your data before implementing a MongoDB system. This is one of the reasons why Embarcadero has added support for MongoDB to ER/Studio XE6; watch this short video to see how ER/Studio can reverse engineer and generate models for a MongoDB database.
{“video”:”http://youtu.be/ZKbjlsa2bhI”,”width”:”400″,”height”:”225″}
For more tips on handling unstructured data in your models, listen as Karen Lopez shares challenges and insights on data modeling for big data and NoSQL technologies, available on demand.
Want to learn more about ER/Studio? Try it for yourself free for 14 days!
About the author:
Rob Loranger is an Embarcadero Product Manager for the ER/Studio product family. Previous to his current role, Rob was a Sr. Software Consultant, and for more than 8 years he has been one of Embarcadero’s leading experts for its database development, management, and architecture software.