There are different claims regarding the coinage of the term big data, but the concept of big data has existed since the mid-20th Century. Librarians and custodians of scientific information identified a trend in the quantity of data generated. The amount of data would double in less than 20 years.
Subsequent advances led to the ability to digitally store vast quantities of data in a reduced physical footprint. These advances also led to new methods of generating more information. It is estimated that by 2025 the world’s data stores will contain 175 zettabytes of data.
The rise of the Internet of Things (IoT) and the consumer Internet have significantly contributed to the increased information creation rates. Most of the world’s data stores have been generated in the last few years. This trend shows no signs of slowing down as the percentage of the population with access to the Internet grows and the use of smart technology increases.
The vast information repositories resulting from this escalating production make up the entity known as big data. It refers to massive amounts of structured and unstructured data that traditional databases and software approaches cannot process efficiently.
Characteristics of Big Data
Big data is often characterized in the IT community by its ability to be defined using the three V’s, first proposed by information analyst Doug Laney in 2001. The three V’s classify big data as data that contains greater variety arriving in increasing volumes and with ever-higher velocity
- Volume – The amount of data is the primary factor describing big data. High volumes of potentially unknown value data must be processed when big data is involved. This volume overwhelms the traditional means of storage and processing.
- Velocity – The speed at which data is generated, received, and possibly acted upon is another characteristic of big data. High-velocity data is collected directly into memory rather than being written to disk. Smart devices often demand real-time data analysis and responses.
- Variety – The third principal characteristic of big data is the variety in the available types of information. Many unstructured or semi-structured data streams have joined structured data types that fit nicely into relational databases. Big data uses information that demands preprocessing to get any meaning or value.
In keeping with the alliterative nature of these characteristics, data professionals have suggested additional V’s to describe big data further.
- Value – Effective use of big data can provide substantial value to an organization. Identifying patterns and trends can cause financial gains and a competitive edge.
- Veracity – The quality and trustworthiness of data are critical as it becomes used more extensively for automated decision-making. Ensuring the correctness of its information is a challenge enterprises face using big data.
- Variability – The variable nature of unstructured data requires new methods to decipher context and meaning through better natural language processing.
- Visualization – It is essential to make big data processing results understandable to stakeholders and decision-makers. Visualization is a critical technology in producing value from big data.
Types of Big Data
There are three forms of big data.
Structured – Information in a fixed format that enables it to be easily accessed, processed, and stored is structured data. Traditional computer technology, such as relational databases, can process structured data.
Unstructured – Data with no defined form or structure is called unstructured data. It cannot be readily processed or analyzed. Unstructured data may contain text, images, and many other items.
Semi-structured – Semi-structured data exhibits some organizational properties that allow it to be analyzed but cannot be strictly formatted for a relational database. Email is an example of semi-structured data.
Collecting Big Data
The variety, volume, and velocity characteristics are apparent when considering how big data is collected. Big data is generated from multiple sources and gathered at varying intervals and timeframes. Deliberate human interaction with software tools or physical collection methods provides the information. Smart devices and sensors may be used for monitoring users.
Here are some methods most commonly used to collect big data.
Transactional data – Point of sale (POS) software combined with a customer relationship management (CRM) tool can create a pool of transactional data. That pool of data can be mined and analyzed. Customer information that can be gathered at each transaction includes:
- What was purchased
- How much was bought, and what promotions influenced their decision
- When and where was the purchase completed
- The payment method used
Over time, profiles can be built that allow targeted marketing based on purchasing history. And the effectiveness of promotional materials can be evaluated so they can be fine-tuned going forward.
In-store Traffic Monitoring
Motion-sensitive sensors can track customers as they move through a store. This data enables merchants to determine which departments or displays attract the most attention and can help them adapt their offerings to satisfy customer interest. Electronic monitoring of customers is an example of the IoT helping to gather data on human activities.
Online marketing analytics
Analytical engines like Google Analytics provide a wealth of information concerning how customers interact with an online presence of an organization. These tools can help guide marketing campaigns and suggest modifications to web pages to make them more attractive to visitors. Details such as which pages generate the most activity in clicks or interactions can help develop a website tailored to customer demand.
Billions of people use social media networks to interact in various ways. Social media analytics can uncover behavioral and demographic information about your current and potential customers. Platform tools like Facebook enable targeted marketing by analyzing the unstructured data flowing through social media feeds.
Customer reward programs
Organizations that offer discounts or reward programs to repeat shoppers do so primarily to create another data stream on their customers. The reward card that customers swipe at the grocery store gives the customer some immediate savings. And it provides the merchant with information on purchasing habits that can be used for purposes such as creating marketing materials or deciding regarding inventory.
The inclusion of global positioning systems (GPS) in smartphones and mobile devices enables information to be gathered concerning customers’ physical location and movement.
Big data can profile employees and evaluate their performance. This information can include identifying which programs they used most often, the time of day when activity peaks, and when devices are powered on and off.
Game developers can get data that helps them create more engaging products for their customers. Such information includes the time spent with a game, levels that cause difficulty, and the rate of in-app purchases. That information provides game manufacturers with valuable data they can analyze to improve their offerings.
Organization Uses of Big Data
Organizations across all market sectors currently use big data in many ways. Analytics is a critical technique in processing big data and helps provide its benefits. Here are some ways organizations derive value from using big data.
Personalizing the customer experience – Gathering information on specific customers through multiple data streams enables the creation of personalized marketing campaigns. It allows the tailoring of online interaction to conform to individual tastes and preferences.
Product development – Big data analytics can anticipate customer demand and proactively bring new offerings. Predictive models consider past activity and product attributes to develop new products and services that appeal to the customer base.
Preventative maintenance – Maintenance can be performed more effectively by analyzing the structured and unstructured data generated regarding the equipment and machinery of organizations. Structured data (such as make, model number, and installation date) supplemented by unstructured data (such as problem logs and error reports) allows informed decisions. Such decisions result in maximizing uptime by proactively addressing problems.
Machine learning – Enormous quantities of data are essential to successful machine learning (ML) initiatives. Big data provides the raw materials that enable artificial intelligence and ML techniques to be used. These techniques create autonomous machines and robots that promise to change the way organizations and industries operate in the future.
Innovation – Big data offers unlimited channels for innovative decision-making throughout an organization. Better conclusions regarding finances, planning, and product focus can be obtained by analyzing the information in the big data stores of an enterprise.
Operational efficiency – Collecting big data on its internal processes enables an organization to use analytics to increase its operating efficiency. Trends not apparent with traditional analysis can be uncovered, resulting in financial savings and increased productivity.
Challenges of Analyzing Big Data
Big data poses challenges to organizations attempting to use it constructively. Significant obstacles are encountered in storing and processing the information effectively. The issues of the volume, velocity, and variety in which big data is generated make it hard to use effectively.
Mining the large datasets associated with big data requires scalable solutions that can handle the varied nature of the information they are expected to process.
Analytical tools need to provide the visualizations to use the insights uncovered in big data stores productively. They need to offer high-performance and enterprise-grade security to ensure that data stays safe and out of the hands of unauthorized entities.
The immense amounts of information comprising big data make it challenging to store using traditional methods. Organizations often need more time to address this issue with on-premises data centers. The scalability of cloud storage resources is an attractive alternative and is one way an enterprise can efficiently supplement its storage capacity.
Devices with embedded intelligence to assist in data mining and analytics promise to help tame some problems of viably storing big data. Edge computing combining data collection, storage, and processing features is becoming more prevalent. It presents another technique to address the volume and complexity of big data.
Concerns Over Big Data Collection
Only some people are enamored with how companies collect and use big data. The vast amounts of personal information held in big data repositories present an inviting target for hackers intent on compromising it. This fact has resulted in pushback by consumers who wish to exert control over how their private information is collected and used.
Initiatives like the General Data Protection Regulation (GDPR) of the European Union and similar legislation in other jurisdictions are attempting to address citizens’ concerns over controlling personal data. These regulatory standards enable individuals to opt out of corporate data collection procedures or have their information removed from enterprise databases. The regulations also put the burden of protecting personally identifiable information (PII) squarely on the organizations collecting and using it.
The responsibility for protecting enterprise data demands a heightened emphasis on security at every stage of the information life cycle. Encryption is one technique that organizations need to secure their data assets. Security concerns also mandate a focus on maintaining compliance with regulatory standards can cause severe financial penalties and negatively impact consumer confidence.
Big data is here to stay, and organizations can attain many tangible competitive advantages over rivals. It provided benefits that outweighed the additional security and compliance issues. As data generation volumes and rates continue to increase, new methodologies will be developed to enable organizations to use them more productively. Ignoring the value hidden in this bonanza of information is a risky proposition.
Idera provides robust solutions for SQL Server, Azure SQL Database, and Amazon RDS for SQL Server:
Idera SQL Diagnostic Manager is a robust solution for SQL Server, Azure SQL Database, and Amazon RDS for SQL Server. That offers 24X7 SQL performance monitoring, alerting, and diagnostics to quickly finds and fix database performance problems.
To learn more about keeping your SQL databases at peak performance, please take some time to check out our 10-page whitepaper, “The Keys to a High-Performing Database,” and discover more about how to keep your databases tuned.