On February 17, 2026, the FDA’s December 2025 real-world evidence guidance became operational. Much of the early commentary has focused on what the guidance asks of the data itself: completeness, accuracy, and the requirement that clinical facts reflect what happened to the patient. Less discussed is what the same framework requires of the process that produced the data: the lineage of every clinical fact, the versions of every model and rule that touched it, the audit trail of every human decision, and the ability to re-run the same pipeline a year later and produce an identical result. Regulatory-grade real-world evidence has to include the how. And the how is an engineering problem, not a documentation one.
The unit of governance is the fact, not the dataset
Most secondary-use platforms govern at the level of the file or the table. Access controls, versioned snapshots, dataset-level data dictionaries. That granularity was sufficient when the regulatory question was about a study population. It is not sufficient when the question is about a specific value for a specific patient.
When a reviewer asks “where did this come from?” about a single comorbidity, date, or medication dose, the answer has to be a click, not a forensic project. That requires a much finer unit. Each clinical assertion has to carry, with it, the source document and the exact line from which it was extracted, the model and prompt version used, the confidence assigned, the normalization decisions that followed, the conflicts detected with other sources, and the rule or human reviewer that resolved them. The unit of governance is the individual fact.
What that requires, architecturally
A platform that meets that standard is not a warehouse with an audit log bolted on. It has to be designed for fact-level provenance from the first parse. In practice, that comes down to a small number of properties that have to hold across every layer.
End-to-end lineage. Every value traces back, through every transformation, to the source document and the line where it appeared.
Confidence propagation. The extraction confidence travels with the fact into derived measures, cohorts, downstream queries and AI agent answers, and any other place where the value is consumed, so uncertainty stays visible.
Immutable versioning. Models, prompts, terminology mappings, business rules, and reference data are pinned, so any past result can be re-derived from its exact inputs.
Deterministic reproducibility. The same query against the same versioned dataset returns the same patients and the same numbers every time.
Full audit trails for data, metadata, and models. Every data access, model run, override, and human decision is logged with identity, timestamp, and context: the kind of record 21 CFR Part 11 already expects of any electronic records system feeding FDA submissions.
Role-based access control. Reviewers, abstractors, agents, and the tools they delegate to see only authorized data, and the access decisions themselves are logged.
Human-in-the-loop validation as a pipeline stage. Low-confidence and conflicting fields are routed to expert review with source evidence shown side by side; the decision is recorded and fed back as training signal.
Visualization of missing and uncertain data, because reviewers need to see what is not there as clearly as what is.
These are not features that can be patched in after the fact. They are constraints that have to shape the architecture from the first layer of ingestion. Retrofitting source links onto a warehouse that was built without them is far harder than carrying them through from the parse.
The tiered architecture that makes it possible
The pattern that holds up under audit is a tiered one. Raw parsed inputs sit in a layer that preserves every byte of every source. No extraction, no normalization, no loss. Above it sits an extraction layer where clinical facts are pulled from text, mapped to standard terminologies, and tagged with their source coordinates and a confidence score. Above that sits a curated layer where duplicates are merged, conflicts across documents are reconciled, and the asset is shaped for analytic consumption. Each layer can be re-run independently. Each layer can be validated independently. And from any value in the top layer, an auditor can walk all the way down to the raw bytes.
The reason that matters for the FDA is not aesthetics. The agency’s expectation of reproducibility, first codified in 21 CFR Part 11 for electronic records and now extended into a per-fact requirement, only holds when every layer is versioned and the dependencies between them are explicit. A monolithic pipeline that mixes parsing, extraction, and reasoning makes point-in-time reproduction effectively impossible. A tiered one makes it routine.
What provenance does not solve
A platform that meets these properties does something specific. It makes the process of generating real-world evidence defensible. It does not, on its own, make the data correct. A confidently-wrong note becomes a confidently-traced record. Garbage in, governed garbage out.
That limitation matters because it is the right one. The agency does not expect submissions to be free of error. It expects them to be transparent about how each value was produced, and reproducible enough that an error, once identified, can be traced back to its source and corrected. Provenance does not eliminate mistakes. It makes them findable and fixable. That is the bar.
Regulatory-grade is an architecture, not a label
Closing the governance gap this guidance creates is an architecture problem. It is not solvable by tightening the SOP around an existing warehouse or adding a dashboard on top of it. It requires choices made at the platform layer, before the first record is loaded.
The sponsors who will move smoothly under the new guidance are the ones who have already made those choices, or who recognize, now, that they cannot be made later. The ones who treat regulatory-grade as a label to be earned by paperwork will find that the paperwork no longer suffices.

David Talby
David Talby is CEO for John Snow Labs.





