AI is showing up in more and more payer strategies, especially as a solution for concerns like risk stratification and predictive modeling, or ambitions like pursuing a clinical data mart. But most efforts hit the same wall: the data just isn’t there yet.
More than 70% of healthcare organizations are either piloting or deploying AI, according to a recent survey from McKinsey & Company. But a 2025 research study found that only 19% say they’re seeing strong results in improving clinical diagnoses. Risk stratification, which is arguably one of the most important capabilities in value-based care, isn’t faring much better. The same study found that just 38% of healthcare organizations are reporting success with it. And the problem isn’t ambition or effort. It’s inputs.
About 60 to 80% of the health data that matters—physician notes, CCDs, discharge summaries, phone call logs—is unstructured. It doesn’t live in tidy fields or easy-to-search systems right now. And that’s a big problem when AI models need clean, labeled data to be effective. Without structure or context for your data, what you get out of the models rarely brings clarity to what you put in.
What Happens When Foundational Data is Weak
Payers have a wealth of analytics tools, but building smarter models to run them won’t help if the clinical data they rely on is incomplete or inconsistent. That’s because most systems still treat structured fields as fact, even when the source is a partial copy-paste from five encounters ago. Social and behavioral factors are buried in notes and never coded. And documents that technically pass validation often lack the nuance needed to guide any real decision. This lack of clean, contextual data hinders a payer’s ability to accurately stratify risk, which is a cornerstone of effective value-based care and population health management.
It adds up to missed flags, failed audits, and inflated downstream costs. Care teams waste time chasing the wrong leads. Outreach happens too late. Quality scores slip. And interventions show up when the damage is already done.
It sounds obvious, but modeling is where things often go wrong. Everyone wants to jump ahead to implementation. But the first question should be: can we actually trust the data we’re using?
Getting to yes doesn’t require a rebuild. It starts with knowing what you have and making it usable. That means building a clinical data mart that’s not just a warehouse, but a source of truth. A place where structured fields are validated, unstructured data has rich context, and the whole thing is set up for reuse.
What This Looks Like in Practice
Picture a member with heart failure. The claim file shows the diagnosis. Maybe a few missed meds. But buried unstructured data? Dizziness. Skipped diuretics due to side effects. Social isolation flagged during a routine follow-up. All key contextual factors that influence risk stratification and care pathways.
If that data isn’t captured, you miss the signal leading to increased disease complexity, higher hospital (re)admission rates, and increased costs. But when it is, or rather, when the system ties together structured data with enriched, meaningful narrative, you can actually act on it. Care managers get flagged in time. A pharmacist adjusts the timing of meds. Support services kick in before the member ends up back in the ED.
That’s not just clinical value. That’s operational ROI.
How to Build a Smarter Pipeline
If the goal is to scale AI and support VBC performance, you don’t need an overhaul of your current platform. You just need a strategy that respects the reality of your data. A proactive approach to data quality is essential for measuring and improving the quality and outcomes, and it just takes a few simple steps:
- Run a data scan. Know what you’re collecting, what’s missing, and where it’s coming from.
- Stop the noise. Not all documentation needs to be kept or processed. Get rid of junk data early.
- Use NLP where it counts. Extract key variables from data before storage, not after.
- Standardize upstream. Fix formatting and terminology concerns before they become a downstream problem.
- Govern intentionally. Set expectations around accuracy, refresh, and audibility, and then meet those expectations.
These aren’t exotic tactics. They’re basic hygiene. But skipping them means you’re building on sand.
Why It Matters for Payers
Every decision tied to risk—contracting, quality scores, outreach programs—depends on the right data, at the right time, in the right context. Without it, models misfire, care coordination lags, and dollars get wasted.
Getting this right means better risk scoring. Fewer readmissions. Stronger HEDIS and STARS results. And faster access to the insights that actually improve care and lower cost.
Final Word: Don’t Skip the Hard Part
Everyone wants scalable AI. But that starts with data infrastructure that can handle the real mess of healthcare. Payers who take the time to clean before they compute will move faster, spend less, and build models that actually deliver.
Because at the end of the day, AI can’t fix broken data. But clean data can make AI worth using.

Kim Perry
Kim Perry is the Chief Growth Officer of emtelligent, a clinical-grade AI solution. In addition to her full-time professional roles, Perry is president of Women’s Health Leadership TRUST, an organization with 500 members that elevates and supports women at all phases of their healthcare career, and an advisory board member at Broadpath, a privately held call center company.






