Next-Generation Tokenization of Real-World Data Creates More Complete Patient and Population Insight

Updated on April 26, 2023
friendly nurse visiting recovering senior patient

Big health data holds substantial promise for improving insight into individual patients and specific populations, improving inclusive research for clinical trials, and providing healthcare leaders the kind of data that can make a meaningful difference in many people’s day-to-day lives.

“The entire time I’ve been covering the health tech space everybody’s always talking about the promise of healthcare big data,” said Jessica DaMassa, executive producer and host of WTF Health, or What’s the Future, Health?

“It seems like we might be closer to that than ever before,” said DaMassa, who moderated the webinar Expanding Real-world Datasets: Uncovering Insights to Improve Patient Outcomes.

Next-generation tokenization of real-world data (RWD) is moving big data in healthcare closer to actionable insights, experts said. The technology overcomes some previous obstacles and allows stakeholders to link disparate datasets – to layer in context like social determinants of health (SDoH), medical claims data (electronic medical records) and mortality information – all the while maximizing health data privacy.

“What’s really great about tokenization is that it prioritizes the securing and safeguarding of patient data,” Camille Cook, MPH, senior director for healthcare strategy and real-world data at LexisNexis® Risk Solutions, said during the webinar.

“Tokenization has the potential to unlock a wealth of insights from multiple data sets without compromising patient privacy or data security,” said Matt Veatch, a webinar panelist, and a real-world data (RWD) consultant and founder and managing director of Revesight Consulting.

“Next-generation tokenization relies on referential linking,” Cook explained. “We can effectively match John Doe with their specified social determinants of health to John Doe’s medical records within their electronic medical record from a third party. We’re bringing all three of those data components into a singular secure database that is allowing for this additional context – a more complete view of a patient both in and out of healthcare settings.”

“That triangulation of data just affords so many more insights than we’ve had accessible to us in the past,” Veatch added.

Webinar attendees were asked a series of poll questions. The first gauged the current state of token strategies and generated the following results when asked, do you currently have a healthcare tokenization strategy in place:

  • 18% – Yes, it’s fully up and running.
  • 27% – Yes, it’s in progress.
  • 9% – Yes, but it needs work and is a current challenge.
  • 45% – No, we’ve thought about it but haven’t started yet.

Healthcare Data: I Need More Context

The greater insights on individuals gained through healthcare data tokenization allow a more contextual and whole person to emerge. And the picture is not static, technology can monitor people over time to generate even more useful, longitudinal data.

At the same time, in aggregate, the granular insights on individuals can add up to meaningful population-level data, which in turn can improve clinical care and create more inclusive clinical trials, Cook added. Veatch agreed: “What we’re trying to do with the harmonization and the aggregation of data is to complete that picture — and really understand the impact of a disease or a treatment on a patient population.”

“The broadest insights are going to come from integrated real-world data,” Veatch said. “Fortunately, some of the cumbersome process of trying to match data across hundreds or thousands or tens of thousands of patients is made easier and much faster through tokenization of real-world data.”

The aim of referential tokenization is to put the de-identified data into context. It links to data on well-being, health status, social and community determinants, and more into a single space. And it can expand to reach otherwise underserved populations.

“We look at social determinants of health as capturing more context for populations that we already semi-understand,” Cook said. “Sometimes we forget about social determinants of health for those that we may not be targeting, for those that we may not be supporting, for those that may not be accessing the healthcare ecosystem.”

“Next-generation tokenization technology with LexisNexis® Gravitas™, can capture a good portion of this population left out of the picture before,” Cook added. “The goal is to improve the individual experience, community healthcare, and future research.”

Privacy Remains Paramount With Healthcare Data

“Privacy is a key consideration and one of the main reasons to use a referential token,” Veatch said. At the same time, “the real value of tokenization” is protecting this privacy while also linking across disparate data sets, possibly across different health systems and different types of data.

For decades, RWD was trapped in paper files and was accessible only through tedious manual curation. Then electronic health record systems (EHRs) came along and changed the landscape, but the technology still has limitations. “They’re faster, they’re efficient, but they’re often isolated or siloed in different institutions or even within the same institution,” Veatch said.

In addition, EHRs are not designed to generate insights. “They were built to store data,” Cook said. “They weren’t built to analyze data.”

The resulting data fragmentation occurs across all facets of the patient record – clinical care, pharmacy, billing, claims and more.

“We’re now figuring out how to better interpret that data,” Cook said. “We’re breaking down a lot of these data silos as we move into this world of tokenization.”

From Inference to True Insight

Before the advent of next-generation healthcare tokenization, healthcare stakeholders were left with little choice but to make inferences about the best interventions for different patient populations. “We tend to look at that as a kind of probabilistic matching. We’re making an inference, which is what we do all the time with healthcare data every day,’ Cook said. And inferences are not at all precise enough when it comes to health.

It’s about being more precise now and moving forward, Cook said. “We do not want probabilistic matching in a clinical space because we’re making inferences consistently in the healthcare space right now. We really need to accelerate that next step of precision medicine — which all comes down to precision linking and one-to-one correlations of attributes associated with an individual.” This can be accomplished “while keeping in mind the security and safeguarding of patients’ sensitive and potentially identifiable information.”

As an example, if a researcher wants to export data from an EHR systems for publication and tie the information to one or two other data sources, that also involves probabilistic matching, Cook said. “We’re maybe 65% to 72% sure on average that we have a true linkage between an outside data source and our internal data source.”

The matching and linking of data require a more robust approach, Veatch said, “which is where the referential token is so incredibly valuable.” Technology is another step on the road to greater precision medicine.

No Turning Back

Next-generation data tokenization is also about striking a delicate balance between protecting privacy and gaining a new world of insight. It represents an advance over safe harbor data de-identification, for example as it relies on expert determination.

“Safe Harbor is like taking a black Sharpie and going over someone’s name and date of birth,” Cook explained. At the same time, Safe Harbor strips out 18 unique identifiers that are extremely important when it comes to context and evaluating social determinants of health social, community context, and demographic information.

With tokenization, data is still de-identified, but the technology generates a token or unique identifier. That identifier becomes part of a process with “so many obstacles to link back where you can no longer re-identify an individual within that data asset,” Cook said.

Another poll asked attendees to indicate select all relevant data sources of data they currently use, and there was a wide range of responses:

  • 83% – Medical Claims
  • 57% – SDoH
  • 33% – Mortality
  • 70% – Demographics
  • 70% – Clinical (EMR and EHR)30% – Imaging
  • 37% – Genomics
  • 43% – Laboratory

“I love seeing imaging on there — because I always feel like that’s been the one that’s been one of the trickier types of data to integrate into all the other types of data,” DaMassa said.

“This is great,” Cook added. “The last three indicators – imaging, genomics, and laboratory data — think about how challenging it is to get that information.”

More Inclusive Clinical Trials

Like a jury of their peers, people stand to benefit when clinical trials enroll people who reflect them in terms of demographics, socioeconomic status, specific health risks, and other factors.

“What’s really impactful about referential next-generation tokenization is … you’re able to aggregate datasets and make true inferences related to inclusive research,” Cook said. “There’s a big push right now in life sciences not only for diversifying clinical trials but ensuring that your research is truly inclusive.”

The goal is to identify interventions for specific populations that improve the likelihood of greater health and wellbeing outcomes, instead of treating clinical research results like they are one-size-fits-all. Greater inclusion could also benefit researchers – potentially expanding recruitment numbers. Cook pointed out that 80 percent of clinical trials are halted because of recruitment challenges.

Breast Cancer Research: Leave No Health Data Behind

Breast cancer can serve as a prime example of how data can change clinical detection and interpretation for the better, said Chirag R. Parghi, MD, a board-certified radiologist and chief medical officer at Solis Mammography.

“Breast cancer, we know, is a very common disease and the most common cancer that affects women in this country,” he said. Caught early, the prognosis for women diagnosed with breast cancer is very favorable. “There’s nearly a hundred percent survival rate when we catch it early.” Women who begin annual screening at the age of 40 and return every year for a mammogram increase their chances of early detection.

The prognosis is not as favorable when breast cancer is diagnosed at a later stage. With conflicting guidelines about when to start and how frequently to get screened, not every woman who begins an annual regimen remains compliant. For those who do not get mammograms, access is a critical factor – and the COVID pandemic likely worsened the situation, Dr. Parghi said. “My advice to every woman over 40 is, please don’t skip a year, whatever you do.”

Data can help here as well, Dr. Parghi said. “Solis Mammography is really identifying those women that have higher chances of being non-compliant for a variety of reasons. The risk factors that would cause women to potentially skip mammograms are the same social determinants and risk factors that actually translate to higher rates of breast cancer as well.”

Breast cancer “is a visual diagnosis,” which can be a limitation, Dr. Parghi added. “Let’s be frank — there is too much subjectivity. Interpretation is based on our own recognition of disease, our own expertise, and what we know to be normal.”

Therefore, the right data could have a “tremendous impact” to help improve detection at earlier stages when the cancer is easier to treat. Dr. Parghi shared an example of a breast imaging case he would have read as normal and asked the woman to return one year later. “Luckily we just received brand new, state-of-the-art computer detection software that helps us look at 3D images to look for tiny abnormalities.” The technology flagged some areas for him to look at closer.

One small lesion of potential interest was an 86% match compared to the technology’s known cancer library. It turned out to be a small, four-millimeter invasive ductal cancer. “That’s an aggressive cancer that we caught while tiny,” he said.

This kind of imaging database could also improve detection of other cancers, including the challenging to diagnose ovarian and pancreatic cancers, Dr. Parghi said. In addition, imaging sometimes reveals calcification of arteries in the breast, which might correlate to similar calcification of the heart and signal an increased risk for cardiovascular disease.

Technology now allows radiologists to call up actual images, not just radiology reports about breast cancer. In terms of clinical care and research, there is “a tremendous potential. It’s so exciting to see,” Dr. Parghi said.

Healthcare is Playing Catch-Up

Data tokenization is not new, but it’s impact on healthcare is still in its infancy, Veatch said. “And we want to be out of the infancy as soon as possible.” “Tokenization has been used in the financial services industry for decades,” for example, he said. “In healthcare, the use cases are really in their infancy, and they represent a lot of potential for research and clinical care.”

Dr. Parghi agreed that healthcare certainly has some catching up to do regarding data tokenization. “It’s reasonable to assume healthcare systems are about 40 years behind the financial industry when it comes to embracing technological changes, so I feel like that’s about right.”

Hundreds of Determinants

LexisNexis Risk Solutions can provide up to 448 individual attributes, Cook said when asked how many social determinants of health LexisNexis Risk Solutions can provide. “Those would be 448 attributes that can provide individualized insight.” Factors include economic stability, social and community context, access to transportation, education, and quality of healthcare. When combined with tokenization and expert determination, the attributes can be de-identified in accordance with HIPAA and remain anonymous.

“In addition to that, within our LexisNexis Gravitas Network we have claims data and mortality data, so the goal is really to create large, blended datasets where we’re bringing in not only our attributes but those from our network partners into a tokenization landscape,” Cook added.

These are important attributes to highlight, she said, including “when we’re looking at prospective studies following people over time.”

Diana Zuskov
Diana Zuskov
Diana Zuskov is AVP of Healthcare Strategy at LexisNexis Risk Solutions.