4 ways data lineage tools elevate healthcare data 

Updated on April 7, 2023
Hand using laptop with database reports and online work concept

The healthcare industry generates an estimated 30% of all the world’s data annually, and the volume grows each year. Much of healthcare data is highly sensitive and subject to evolving regulatory standards. 

If healthcare data volume and sensitivity weren’t overwhelming enough, healthcare data ecosystems are diverse and interconnected, integrated with many applications, microservices and infrastructures. Those connections rely on countless dependencies, with the nature of those dependencies obscured in a “black box” for most users. 

Because of this “black box,” organizations struggle to maintain accurate data, much less leverage their data to make patient care and business decisions. Data lineage tools illuminate data dependencies, tracking the journey of data as it moves through complex systems and undergoes various transformations along the way. 

When used effectively, automated data lineage tools provide predictive insights and recommendations, enabling healthcare organizations to make better data-driven decisions, comply with regulations, and ensure the accuracy and trustworthiness of their data.

You can use data lineage to improve how you use healthcare data in four ways. 

Build trust in reports and insights. 

In the last decade, hospitals and office-based physicians have implemented electronic health records (EHR) platforms extensively — more than 90% of non-federal acute care hospitals now use a certified EHR. And while the move from manual to electronic records has the potential to reduce errors and improve patient outcomes, the complexity of these platforms and how they interface make accurate, consistent data entry a significant challenge for healthcare staff. 

A mistake that makes its way into one system can replicate in other systems, reproducing the error and contaminating countless data sets as it echoes through EHR platforms. A lack of visibility into the origin and path of bad data calls other data sets into question. 

Such errors can negatively impact patient and business outcomes. Medical recommendations based on erroneous data pose a danger to patients and diminish trust. Low-quality data hinders  healthcare organizations’ ability to determine medical and workforce trends impacting operations.  

Identifying the accurate data, pinpointing its intersection with inaccurate data and deleting the bad data — while preserving the good data — becomes a herculean task. Enter automated data lineage. 

This approach improves visibility into how data flows through complex systems by adding a semantic layer that provides users a consistent way to understand and interpret how their data moves through systems. 

The semantic layer of data lineage enables users to: 

  • Differentiate between direct and indirect data dependencies.
  • Understand the evolution of data lineage over time.
  • Translate data processing code into high-level, user-friendly expressions. 

Adding a semantic layer provides users with a high-level view of data tracking and the ability to drill down to specific tables and cells, ensuring more accurate data across systems and saving time otherwise spent in manual data reconciliation processes. 

By improving data quality and automating more processes, data lineage facilitates more accurate and timely reports and insights, which result in more informed decisions, better forecasting and enhanced patient and business outcomes. 

Catch data incidents before they happen.

For the 12th consecutive year, healthcare had the highest average data breach cost of any industry at $10.1 million, according to IBM’s “Cost of a data breach 2022” report. Over 50 million patient records experienced breaches in 2021, and roughly 132 days passed between a breach event and its discovery.

Because most healthcare systems have limited visibility into their complex data systems, assessing data dependencies can prove resource-intensive — or even impossible. An abundance of overlapping systems and data migrations creates opacity with data quality and dependencies.

The massive impact of a data breach and the vast number of patients affected means healthcare systems can’t afford to wait until a breach happens to improve their visibility into complex data systems. 

Many current data observability tools reactively approach data quality and security. These tools seek out and fix bugs when an incident occurs. Data lineage tools offer IT teams advanced levels of observability with minimal manual intervention, facilitating a proactive approach to prevent data incidents. 

To understand the value of data lineage capabilities, consider the difference between an emergency room visit and an annual checkup. You visit the emergency room with acute symptoms and a loss of functionality — in data terms, a bug. The ER staff diagnoses and treats your condition based on your symptoms and medical history. ER care is expensive and time-sensitive. The time that passes before treatment could permanently (and negatively)affect your health. Your annual checkup, by contrast, is preventative care. Your doctor has greater visibility into all aspects of your health, offering insight into possible future health conditions and recommending preventative measures. When executed effectively, automated data lineage tools provide this preventative care. 

Decrease the risk of regulatory non-compliance.

Healthcare data faces more scrutiny than many industries because its high volume is also subject to heavy regulation under the Healthcare Insurance Portability and Accountability Act (HIPAA). In the last 20 years, the U.S. Department of Health and Human Services’ Office for Civil Rights (OCR) has investigated tens of thousands of HIPAA complaints and imposed more than $130 million in penalties. General hospitals, private practices and pharmacies top the list for investigated complaints. 

These regulations require accurate data tracking and evidence of compliance through detailed, reliable reporting whenever necessary. To ensure compliance and avoid hefty fines, your organization must demonstrate regulated data’s source, accuracy and flow. Automated data lineage provides a comprehensive visual overview of healthcare data, empowering organizations to meet regulatory requirements while minimizing manual intervention. 

Improve business intelligence insights 

When a healthcare organization can trust its reporting, prevent data breaches and comply with regulatory requirements, leaders can leverage accurate, reliable data for greater business intelligence insights to improve patient care, optimize operations and drive business growth. 

Business intelligence has grown in importance, but many organizations rely on ad-hoc solutions and manual processes like spreadsheets to glean insights from their data. Automated data lineage helps organizations avoid the challenges associated with manual lineage and deliver solutions more efficiently and quickly. It empowers analysts to understand data dependencies, conduct in-depth root cause analyses of past issues and predict downstream impacts of future changes. 

Automated lineage tools support better business intelligence by: 

  • Automatically scanning and mapping data environments.
  • Mapping datasets, streams and flows for a more manageable view of data sources.
  • Revealing connections between systems, workspaces and data objects to minimize blind spots. 
  • Detecting and surfacing previous revisions and changes in the data environment so analysts can visually compare past and present data flows.

Healthcare data will continue to grow in volume and complexity as organizations shift away from legacy systems, introduce new integrations and environments, and undergo mergers and acquisitions. Regulatory requirements will keep evolving, and cyber attackers won’t relinquish their efforts to breach healthcare organizations anytime soon. 

Today’s data ecosystem is a minefield of cloud-based and on-premises infrastructure, where data travels through databases and data lakes and from data warehouses and ETLs to reporting systems. Knowing how data is flowing through the enterprise helps healthcare systems improve business performance, manage regulatory compliance and risk, more effectively handle evolving data sources, and shrink IT cost and risk.

Ernie Ostic
Ernie Ostic

Ernie Ostic is Chief Evangelist at MANTA, focusing on solutions for lineage and metadata integration and providing guidance on information governance and custom lineage solution architectures. He brings 40+ years of experience in data integration, including more than 20 years at IBM where he worked in a variety of roles including product management and technical sales support. Ostic is a graduate of Boston College, where he earned his bachelor’s degree in computer science.