Efficient Data Acquisition in Life Sciences: Maximizing Value, Minimizing Waste

Updated on December 16, 2024
Big Data in Healthcare and Its Impact on Patient Care

The ability to unlock meaningful insights from data is critical to driving research and innovation in the life sciences industry. Companies are increasingly seeking real-world data for insights into the effectiveness and safety of medical products, to help inform drug development decisions, and to improve patient outcomes. It’s important to note that while more data is available than ever before, much of the data purchased is underutilized, and not all of it will serve your end goals. 

Analyzing and fully understanding data quality, completeness, coverage, and suitability for your use case can be very difficult before making a data purchase. Often, companies can get stuck in a ‘buy now and figure out what’s in it later’ cycle, which results in buying as much data as they can afford. But more data doesn’t necessarily mean better outcomes; issues with data quality, usability, and accessibility limit data’s utility and can result in spending too much on data that lacks valuable insights.

As you plan your data acquisitions for the coming year, it’s important to explore the common pitfalls that lead to overspending and the strategies that ensure maximum value from these investments. 

Unclear Data Strategy

Before purchasing any dataset, consider the specific questions it needs to address. In the absence of clear data objectives, companies may acquire data without a well-defined plan for how it will be utilized. Can you identify the types of data required, the reliability of data sources, and known gaps in your data? Can you measure the data’s impact on your objectives, such as drug development timelines or market share? Answers to these questions help to minimize data overspend and ensure the data you do purchase will contribute to your desired outcome.

In life sciences companies, data is often purchased at the brand, therapeutic area, and enterprise levels. Whether the data is being used for market assessments, KOL identification or HCP outreach, it is likely to include an unknown amount of data overlap. Without a clear data strategy, this accumulation of data can lead to wasted resources.

For instance, consider a pharmaceutical company’s obesity and diabetes therapeutic area using real-world data to assess the market potential for a new treatment. Or consider the analysis of real-world data to identify key opinion leaders and healthcare providers to support the launch and adoption of a new therapy. Too frequently, similar but independently acquired datasets are purchased for these analyses – brand data, data for obesity, data for diabetes – but usually include much of the same information. 

If you think buying more data is the answer, first assess what overlaps there might be in the datasets. Likely, there will be more overlap than you think. If you understand the information each dataset is providing, you may be able to avoid buying additional sources. 

Poor Data Quality and Completeness

Healthcare data is highly complex, and as a result, the quality is often inconsistent, requiring significant investments in data cleaning and preparation. Even quality validation can be inconsistent or inaccurate without correcting for missing or incomplete data. 

Another challenge is working with de-identified patient data. De-identifying datasets with personal health information (PHI) is often done by replacing the identifiable elements of a patient’s data (name, birth date, etc.) with a pseudo-random string of characters, often referred to as a token. This “tokenization” of patient information allows for both the safe sharing of health data and the ability to link de-identified data from multiple sources to build more complete longitudinal views. Issues with these processes, like false positives (two patients associated with one token) and false negatives (one patient associated with multiple tokens), can lead to a variety of problems, including inaccurate and duplicated information. 

Compensating for bad data by purchasing more data is not the solution. Consider how you will measure the quality of the data you are buying. Can your vendor provide quality metrics that allow you to assess the accuracy and coverage of your data? Can your platform validate data quality and integrity?

Insufficient Data Technology and Governance 

Due to the sensitive nature of regulatory compliance requirements and the potential impact of data-driven decisions on patient health, data governance is imperative. Often, large companies are burdened with legacy systems that silo data and make it difficult to access and share. Proprietary data formats also make it challenging to integrate data from different sources.

Poor data governance can lead to underutilizing existing assets and unnecessary data purchases. In addition, outdated governance tools prevent greater awareness of what data is available and limit control over the entire data ecosystem. Modern data intelligence tools are also more efficient and cost-effective and can adapt to new data types and methods, like AI and machine learning. Their ability to scale and innovate will be essential as data needs evolve. 

Before purchasing new data, consider your data governance and technology. Evaluate if adequate access controls and permissions are in place to prevent unauthorized access and facilitate data sharing and consider what type of analytic tools and capabilities must be included in your platform.

Poor Usability and Unclear ROI Metrics

Studies suggest that a significant portion of data used by life sciences companies remains unused or underutilized, often due to integration, standardization, and analysis challenges. For example, Forrester reports that between 60 and 73 percent of all data within an enterprise goes unused for analytics. 

Unusable data leads to errors, inefficiencies, and missed opportunities. Before purchasing data, consider how easily it can be accessed and understood. Data presented in a clear and understandable format, free from ambiguity or inconsistencies will lead to better outcomes. Therefore, data that is cleaned, standardized, and enriched reduces costs. 

Next, consider whether the data you purchase can be integrated with other data sources, as preparing data for analysis is time-consuming and labor-intensive. When reviewing the purchase price of data, look at the costs associated with data cleaning, integration, storage, and maintenance. By carefully evaluating these factors, you can decide which data sources and vendors best align with your company’s needs and strategic goals.

The strategic acquisition of data is crucial for companies to drive innovation, enhance commercial performance, and improve patient outcomes. However, excessive data purchasing can lead to wasted resources and limited returns. To maximize the value of your data investments, it is essential to:

  • Define your data objectives, identify the data types required, and assess the reliability of data sources.
  • Implement robust data governance practices and leverage modern data intelligence tools to ensure data quality, security, and accessibility.
  • Validate data quality, address missingness and inconsistencies, and consider the impact of de-identified patient data on analysis.
  • Ensure data is easily accessible, understandable, and integrated with other data sources.

By carefully considering these factors, you can avoid the data spending trap making smarter, more strategic purchasing decisions that not only maximize your data investments but also drive more significant, lasting outcomes.

Ryan Leurck
Ryan Leurck
Chief Analytics Officer at Kythera Labs

Ryan leads the Analytics and Products teams at Kythera Labs. He is an engineer and data scientist with over 13 years of experience in operations research, system-of-system design, and research and development portfolio valuation and analysis. Ryan received his start on the research faculty at The Georgia Institute of Technology Aerospace System Design Lab where he led researchers in the application of machine learning and big data technologies.