Healthcare’s Most Powerful Innovation Isn’t a New Algorithm — It’s Access

Updated on December 19, 2025

Artificial intelligence (AI) has reached a pivotal moment. The internet has largely been scraped, and for many use cases, there’s simply not enough high-quality, real-world data left online or publicly available. In specialized industries such as healthcare, new and more reliable data sources are crucial. Still, the vast majority of relevant data is trapped within private systems, beyond the reach of current AI models.

The next evolution of healthcare is hidden within the data that healthcare providers, hospital systems, healthcare companies, and research institutions already hold. Decades of clinical experience, imaging, and patient outcomes are sitting in silos, waiting to be connected in ways that can transform how care is delivered. When this data is responsibly unlocked and used to train and evaluate AI models and build AI applications for healthcare, it can give doctors deeper insights, speed up diagnoses, and help personalize treatment for every patient. That said, patient data is sensitive, powerful, and inherently personal, and any effort to connect it must start from a place of preserving privacy. By connecting and protecting this data responsibly, we can build AI tools that truly support providers, strengthen trust in technology, and improve outcomes at every level of care.

The Data Dilemma: Patient Privacy and Compliance

Healthcare has no shortage of ambition when it comes to AI. In fact, a Fierce Healthcare survey found that 76% of physicians already use large language models (LLMs) for tasks like checking drug interactions, diagnosing conditions, planning treatments, and performing literature reviews.

But at the same time, ambition runs up against reality. Access to multimodal training data — the kind that spans imaging, lab results, physician notes, and other key modalities while maintaining longitudinal patient journeys and outcomes — remains a significant hurdle. Public datasets, while helpful, are limited in scale, narrow in focus, and/or geographically constrained. Add to this the complex regulatory environment, where AI builders must navigate privacy and HIPAA constraints, data diversity challenges, and legitimate concerns about bias or misuse, making progress difficult. Healthcare data is intentionally difficult to access because the cost of mishandling it is measured in human lives impacted and trust, not in model performance. Many clinicians remain cautious, citing concerns about diagnostic errors, privacy risks, and job displacement.

The bottom line? Despite its unprecedented technological potential, healthcare AI still lacks the comprehensive and representative data it needs to reach its full promise.

The Opportunity: AI as a Force Multiplier for Providers

With access to diverse, real-world data, AI could become a true force multiplier for providers and researchers. Models could identify subtle population-wide patterns, uncover previously unseen correlations, and help clinicians deliver deeply personalized care. For example, when digitized pathology slides are combined with genomic and clinical data, AI systems can detect cellular changes that can be overlooked by human review — enabling earlier diagnosis, continuous monitoring, and more personalized treatment decisions.

The potential is immense. According to the Deloitte Center for Health Solutions, 92% of healthcare leaders see promise in Generative AI for improving efficiencies, and 65% believe it can accelerate decision-making. With robust, ethically sourced data, AI can move healthcare from a reactive system to one built on proactive, precisely targeted medicine.

Data is the bridge between today’s capabilities and tomorrow’s breakthroughs in the healthcare field. The tools are already powerful, and the more relevant data they consume, the more impactful they’ll become.

Knowledge Sharing as the Next Step Forward

To date, many healthcare AI models have used manufactured datasets or synthetic data designed to mimic real cases. This reflects the general lack of publicly available data that has been accessible for healthcare AI model developers. However, to reach the next stages of AI application and usefulness, more real-world data from actual clinical patient environments, which have been created organically over time, are needed to fully capture the diversity and unpredictability of real clinical environments. 

For example, some training data has included simulated patient questions and answers created by doctors artificially. This represents only a fraction of the complexity inherent in real-world clinical data that actually present themselves in doctors’ offices and triage rooms. True real-world datasets encompass vast variation, randomness, and diversity that cannot be perfectly replicated through simulation alone.

Specialized healthcare AI requires more than textbook-perfect patient journeys — it needs real ones. To make AI clinically valid, models must reflect actual patient outcomes and the nuances of real care delivery. That’s where responsible data sharing becomes vital. 

Ethical collaboration means enabling access to real-world data in a way that preserves privacy and serves outcomes the entire industry can stand behind. This requires designing systems where privacy isn’t an afterthought but the foundation where data contributors know that their information cannot be misused, re-identified, or exposed as it moves across institutions and applications. A future where healthcare data is shared safely and transparently will unlock a new era of discovery, diagnostics, and collective understanding of human health.

The Road Ahead: Collaboration Over Isolation

AI model builders are actively seeking healthcare data to advance diagnostics, treatment planning, and predictive care. Yet today, that data remains fragmented, scattered across EHRs, imaging providers, pathology labs, hospital systems, and other data providers. No single organization has a complete, representative view of the patient journey that accurately represents the entire population the AI models are designed to serve, and each dataset is incomplete on its own.

The next breakthrough will come from connection, not isolation. By uniting disjointed data sources into a comprehensive, longitudinal picture of patient health, we can train AI models that are both powerful and representative of the populations they aim to serve.

When healthcare providers, innovators, and data stewards collaborate — with privacy and integrity at the forefront — AI can evolve from a promising tool into a trusted partner. One that helps doctors, care teams, and researchers diagnose, treat, and heal with greater precision and confidence than ever before.

Bobby Samuels
Bobby Samuels
CEO at Protege

Bobby leads Protege’s strategy and execution across product, go-to-market, and capital formation. He co-founded Protege in 2024 and has served as CEO since inception. Under his leadership, Protege has raised $35M in funding and scaled to $30M in GMV in its first full year of business. Previously, Bobby was General Manager of Privacy Hub at Datavant, where he helped drive the company’s growth leading up to its $7.0B merger with Ciox Health to create the largest neutral health data ecosystem in the U.S. Earlier, he led partnerships at LiveRamp, where he developed expertise in building neutral data networks. Bobby holds an M.B.A. from the Stanford Graduate School of Business and an A.B. from Harvard College, where he was President of The Harvard Crimson. He brings deep expertise in regulated data exchange and translating complex infrastructure into trusted AI enablement for enterprise partners.