Mastering the Complexities of Analytics for Big Data in Healthcare, Part 2

by Aug 9, 2023

In Part 1, we discussed the properties of big data and how it affects the data gathered in the healthcare industry. Here we go over methods for handling big data and overcoming challenges to achieve data-driven financial and clinical objectives, starting with ingesting, cleansing, and storing the data, protecting and maintaining it, then making it useful for sharing in presentations and reports.

Methods for Handling Big Data

Return on Investment

Return on investment refers to whether healthcare organizations can extract value from data.

The benefit may come from more thoughtful strategic decision-making, improved organizational efficiencies, and better results. As such, healthcare organizations must continuously determine whether analytics for big data enhances their organization. Deriving value from analytics for big data commencing by defining specific apply cases to accomplish essential tasks with straightforward returns on investment.

Many healthcare organizations are in the initial stages of developing the capabilities that allow them to complete tasks with straightforward returns on investment. Further, generating actionable insights that healthcare organizations can apply to practical problems are challenging and complicated mission.

Healthcare organizations that follow the fundamentals of analytics for big data can achieve meaningful returns on investment. These fundamentals include taking creative approaches to distribute insights to end users across the organization, employing skilled data scientists, architecting robust infrastructures for information technology, and governing data by following solid principles.


Databases ingest all data from somewhere. However, data only sometimes originates from locations with impeccable habits for the governance of data. Obtaining data formatted correctly for use in multiple systems, accurate, complete, and clean, is an ongoing struggle for healthcare organizations.

A partial comprehension of why analytics for big data is vital, tortuous workflows, and poor usability of electronic health records contribute to problems with data quality throughout its life cycle.

Healthcare organizations can reinforce their processes for gathering data by developing projects to improve clinical documentation that guide healthcare providers on how to ensure that data is helpful for downstream analytics for big data, requesting assistance from information management professionals, enlisting expertise in the governance of data, and prioritizing valuable types of data for essential projects.


Healthcare organizations are accustomed to cleanliness in operating rooms and clinics. They may need to be more aware of how vital it is to cleanse their data. Unclean data can derail projects for analytics. Such problematic data is challenging when combining disparate data sources. Scrubbing of data ensures datasets are pertinent, consistent, correct, accurate, and not corrupt.

Healthcare organizations often cleanse data manually. Vendors offer automated tools to scrub data that apply logic rules to compare, contrast, and correct large datasets. These tools are becoming increasingly sophisticated and accurate as techniques for machine learning continue to advance. This improvement in ensuring time and expense to ensure high accuracy and integrity of data warehouses for healthcare.


Healthcare providers need to consider the storage location of their data more. However, storage locations are critical concerning information technology departments’ performance, security, and cost. Healthcare organizations may no longer be able to manage the impacts and costs of on-premise data storage as the healthcare industry’s data volume proliferates.

Healthcare organizations are most comfortable with on-premise storage of data. Such on-premise storage promises control over availability, access, and security. However, on-premise servers can be predisposed to producing data silos across different departments, which are challenging to maintain and costly to scale.

Storage in the cloud is becoming increasingly widespread as reliability increases and costs decrease. Most healthcare organizations apply several cloud-based infrastructures that include storage and applications. The cloud offers more natural expansion, lower up-front fees, and rapid disaster recovery. However, healthcare organizations must carefully select organizational partners that comprehend the importance of security and compliance requirements specific to healthcare. Such conditions include the Health Insurance Portability and Accountability Act.

Many healthcare organizations wind up with hybrid approaches for their storage of data. Such hybrid approaches may be the most workable and flexible method for storing and accessing data for healthcare organizations with varying needs. However, healthcare organizations must ensure that disparate systems can communicate and share data with other parts of the organization when necessary when developing hybrid infrastructures.


Protection refers to whether data remain secure. Security is essential to the healthcare industry, mainly as data migrates between healthcare organizations due to increased interoperability and storage moves to the cloud. Moreover, data in the healthcare industry is subject to a wide array of vulnerabilities intensified by the Health Insurance Portability and Accountability Act. Such weaknesses include ransomware attacks, devices accidentally left in public transportation, hacking, malware, phishing attacks, and data breaches.

The protection of data is the number one priority for healthcare organizations. Healthcare organizations concerned with data vulnerability must ensure their staff trains regularly to keep data secure and private. These healthcare organizations must also provide that their organizational partners sign Business Associate Agreements for the Health Insurance Portability and Accountability Act to maintain compliance with the strict rules for security and privacy in healthcare.

The Security Rule of the Health Insurance Portability and Accountability Act consists of a long list of technical defenses for healthcare organizations that store protected health information. Such protections include protocols for authentication, security for transmissions, and controls for auditing, integrity, and access. In practice, reasonable procedures for protection represent these defenses. Such processes include applying multiple-factor authentication, encrypting sensitive data, configuring firewalls, and applying anti-virus software.

The weaknesses of staff can breach even the most secure data center. Such personnel tends to prioritize convenience over severe constraints on their access to data and software and lengthy software updates. Healthcare organizations must frequently remind their staff of the critical nature of protocols for data security. These healthcare organizations must also consistently review which team can access sensitive and valuable datasets. Otherwise, evil parties may cause damage.


Data in the healthcare industry may have long shelf lives. Such long shelf lives are especially relevant in clinical settings. Healthcare organizations must keep patient data accessible for at least six years. Also, healthcare organizations may utilize de-identified datasets for research projects. However, applying de-identified datasets makes ongoing curation and stewardship critical. Data may also be reexamined and reused for other reasons. Such reasons include benchmarking of performance and measurement of quality. Data analysts and researchers must comprehend the meaning of data, authors, creation dates, who previously used the data, and when, how, and why. 

Metadata allows data analysts to replicate previous queries. This is vital for precision benchmarking and scientific studies. Also, such exact replication prevents the creation of isolated datasets.

Healthcare organizations must assign stewards of data to curate and develop descriptive metadata. Data stewards ensure that all elements remain helpful for the relevant tasks, are documented appropriately from creation to deletion, and have standard formats and definitions.


Data in the healthcare industry is dynamic rather than static. Moreover, most data elements require relatively frequent updates to remain relevant and current. Updates may occur every few seconds for several data sets (such as patient vital signs). Other information (such as home addresses and marital statuses) may only change occasionally during the entire lifetime of individuals. Comprehending the volatility of analytics for big data can be challenging for healthcare organizations that do not consistently monitor their data assets. Healthcare organizations must have clear ideas of how to conduct updates without damaging the quality and integrity of datasets, how to complete this process without downtime for end-users, which datasets can be automated, and which datasets need manual updating. Healthcare organizations must also ensure they are not creating duplicate records when attempting updates to single elements. Such duplicate records make it difficult for healthcare professionals to access necessary information for patient decision-making. 


To be able to access data is the basis for the analytics for big data and reporting. However, healthcare organizations must typically overcome several challenges before engaging in meaningful analysis of their relevant data assets. Healthcare organizations must overcome interoperability problems and penetrate data silos that prevent query tools from accessing the entire repository of information on the healthcare organization. It may be impossible to generate total views of individual patients’ health and healthcare organizations’ status when different datasets components exist in various formats and multiple segregated systems.

Quality and standardization may need to be improved even when holding data in typical warehouses. It may be challenging to ensure that queries are identifying and returning the correct information to end-users without medical coding systems that reduce free-form concepts into shared ontologies. Medical coding systems include Logical Observation Identifiers Names and Codes, Systematized Nomenclature of Medicine – Clinical Terms, and the 10th revision of the International Statistical Classification of Diseases and Related Health Problems.

Many healthcare organizations apply structured query language to access relational databases and massive data sets. However, the structured query language is only valid when end-users can first trust the data’s standardization, completeness, and accuracy.


Healthcare organizations must generate reports that are accessible, concise, and clear for their target audience after they establish query processes. The integrity and accuracy of data have critical downstream effects on the reliability and accuracy of reports. Complex data produces suspect reports. Such suspicious reports can harm healthcare professionals who attempt to apply the information to treat patients.

Healthcare organizations must comprehend the difference between analysis and reporting. Reporting is often the prerequisite for analysis. That is, analysts must extract data before examining it. However, reporting can also represent end products.

Several reports focus on convincing end-users to take specific actions, arriving at new conclusions, and highlighting particular trends. In contrast, end-users can draw inferences concerning the meaning of the full spectrum of data. Healthcare organizations must be transparent concerning how they plan to apply their reports to ensure that database administrators can generate the required information.

Projects for quality assessment and regulatory compliance frequently require large volumes of data to feed models of reimbursement and measures of quality. Consequently, a significant amount of reporting in the healthcare industry is external. Healthcare organizations have several choices for meeting these various requirements. Options include web portals hosted by groups like the Centers for Medicare and Medicaid Services, reporting tools built into electronic health records, and qualified registries.


Presenting refers to how data analysts visualize data for end-users. Data processing, part of healthcare professionals’ daily workflow, is complicated. Adding to the complexity by presenting challenging to comprehend and dense reports only aggravates end-users concerning the potential of information technology in healthcare. In particular, healthcare professionals needed help with the usability of their interfaces for electronic health records. The healthcare professionals complained about excessive mouse clicks and alerts and insufficient time to accomplish all tasks.

Healthcare organizations must consider good practices for presenting data. Such methods include correct information labeling to reduce potential confusion and charts that apply proper proportions to illustrate different figures. Low-quality graphics, overlapping and full text, and complicated flowcharts can annoy and frustrate end-users, leading them to misinterpret and ignore data.

Engaging and clean visualizations of data make it much easier for healthcare professionals to absorb and apply information appropriately at points of care. Intuitive displays of data may be the difference between ignoring and utilizing critical insights in hectic emergency departments and intensive care units. Visualizations of data include histograms, scatter plots, pie charts, bar charts, and heat maps. Each of these visualizations illustrates specific information and concepts. Developers must consider displays that apply recognizable formats for charts to highlight essential ideas without overwhelming end-users. Color coding is beneficial for visualizing data that typically produces immediate responses. Filtering data prevents information overload and mitigates feelings of burnout among overworked healthcare professionals. Interactive dashboards are another option for reporting clinical, operational, and financial metrics to end users. Online mapping tools are convenient for visualizing technology adoption rates and public health concerns on national and local scales. Concurrently, various new applications for desktops, tablets, and smartphones give end-users methods to interact with data more meaningfully.


Few healthcare organizations operate entirely independently. Moreover, only some patients receive all their care at one location. Consequently, sharing data with external organizational partners is essential. Sharing data is especially important as the healthcare industry moves towards managing population health and value-based care. Data interoperability continuously concerns healthcare organizations of all types, sizes, and positions along the data maturity spectrum. Being able to migrate data between disparate healthcare organizations can be severely constrained by fundamental differences in how organizations design and implement electronic health records. This inability to migrate data leaves healthcare professionals needing more information. That leads to difficulties in working with patients, making critical choices, and formulating approaches to improve treatment results. Healthcare organizations must develop environments to distribute trustworthy, timely, and meaningful information from analytics for big data to all members of the healthcare environment.

The healthcare industry is working to improve data sharing across technical and organizational barriers. Developing tools and strategies (such as Fast Healthcare Interoperability Resources and public application programming interfaces) and corporate partnerships (such as CommonWell Health Alliance and CareQuality Governance) make it easier for developers to share data efficiently and securely. However, the adoption of these methodologies has not yet arrived at the threshold, which leaves healthcare organizations excluded from the seamless exchange of data with patients.

Idera provides robust solutions for SQL Server, Azure SQL Database, and Amazon RDS for SQL Server:

  • SQL Diagnostic Manager offers 24X7 SQL performance monitoring, alerting, and diagnostics to quickly finds and fix database performance problems
  • SQL Compliance Manager protects your data by monitoring activity and changes with powerful alerting and tamper-proof audit tools

Additional Big Data Resource:

To learn more about what Big Data is and its usefulness, please take some time to check out our 10-page whitepaper, “Big Data and Its Benefits for Organizations.”