Data: The Fuel That Powers Clinical Research

118

By Jeff Evernham, VP Product Strategy at Sinequa

Data has become one of the most valuable assets for companies, governments, and societies around the globe. Every day in 2020, we created 2.5 quintillion bytes of data and the latest estimates show that we’ll have over 200 zettabytes (that’s 2,000 million million million bytes) in the cloud by 2025. The pandemic has fast-tracked our journey to becoming data-driven, with governments and hospitals using data analytics to determine the best course of action for the public and patients and get a better understanding of the virus. 

But you already knew that data holds immense value; health and life sciences organizations have been collecting and harnessing data for decades due to the nature of scientific research, which requires expert use of data to drive evidence-based decision-making. Data guides researchers throughout the drug development process, it determines a drug’s suitability for the market, and it monitors ongoing successes. Data is the fuel that powers the engine of a well-functioning health and life sciences organization and yet, despite all this experience, we’re still struggling to do it well.

According to the M-Files Intelligent Information Management Report 2019, which surveyed 1,500 businesses globally, there are three major pain points related to data management:

  1. Information mazes: Almost half of all survey respondents said it was challenging to find the right information
  2. Version gambles: Over two-thirds said it was difficult to find the right version of a document
  3. Document duplication: More than 8 in 10 respondents said they had to recreate a document that already existed because they couldn’t find it on their corporate network

We can only address these challenges when we know what is causing them. There are a number of contributing factors. 

The data problem

  • Information overload

The biggest challenge to overcome is the sheer volume of information that must be managed, mined, and stored – and it’s growing. Every year, more than 3 million research papers are published in over 33,000 journals. To put that into perspective, it would take the average person 85 years just to read their abstracts! COVID-19 has complicated matters further, causing a rapid influx of clinical trial data for vaccines, country-specific hypotheses, scientific predictions of transmission, records of the consequential impacts on mental health; the list goes on. To date, over half a million papers have been published regarding the virus. 

Even with the advances in intelligent automation technologies, the need to collect and store so much information is overwhelming the ability to access, find and use it, turning data lakes into data swamps.

  • Much of the data is unstructured

Gartner stated that unstructured text makes up nearly 80 percent of all global content. For health and life sciences, crucial data such as doctor’s notes and research observations are unstructured text, which is ignored by most BI tools and data systems, and much more difficult to harness for insights. Unstructured content holds enormous potential but most of these insights are untapped because no one has the time or the energy to mine through them.

Although some life sciences businesses have an internal search engine or data management tool, most of those are limited to keyword matching technology, leaving them with a narrow scope to manage such a vast range of content.

  • Disparate data siloes

Managing such large quantities of data is an incredibly difficult task on its own, but it is further complicated by the number of formats and systems that house it. Across large organizations, content is stored both on-premises and in the cloud in disparate formats, ranging from lab reports and safety summaries to clinical study results and presentations. Depending on the type of file, each document is stored in its own source system, making it incredibly time-consuming for employees to find the information they’re looking for, and they may not even find it at all.

  • Inadequate information management

With data coming in all shapes and sizes, as well as being stored in so many different locations, information management is one of the biggest challenges facing health and life sciences organizations in the present day. How can all this content be curated, labeled, and cataloged? Often, it can’t. Centralizing this data in one place would be the best solution, but even if that’s possible (think: data lake) the problem then becomes retrieval: how to capture the important details that are crucial to use that information most effectively.

As such, the ability to locate information relies heavily on how a company stores it. Unfortunately, it is often the case that content is not classified systematically, as it is tagged incorrectly, inconsistently, or not at all, making things difficult to find and more time-consuming for employees. And these problems don’t just impact employees. Organizations suffer when they can’t provide the necessary information to fulfill requests from regulatory agencies, and intellectual capital is often lost forever when knowledgeable employees leave or retire.

  • Lost productivity

The individuals working within the health and life sciences industry are incredibly talented. They have trained for years to be able to carry out elaborate tasks and form a vital part of a streamlined healthcare organization. According to research from Sinequa, those workers spend approximately 35 minutes each day looking for information – that’s about 3 hours a week! This time could be much better spent focusing on doing the job they were hired to do, but with information scattered, the simple task of locating information can waste an enormous amount of time. In cases where employees can’t find the content they’re searching for, they may have to redo work that has already been done, taking even more time away from high-value work.

Final thoughts

It’s clear that the management of information and data is a critical issue for the health and life sciences industry. Having untapped, unstructured data that could provide valuable insights for research is a barrier to faster innovation, dilutes decision-making, creates risk with regulatory authorities, and hurts productivity. In an industry where the rapid delivery of safe medicines can offer life-changing treatments, access to the right data at the right time is essential. Getting drugs to market quickly and economically is not only beneficial for the company, but it can help to save or dramatically improve lives. 

Fortunately, there are solutions that can help, such as intelligent search, which I will explore in my next article.