A Machine Learning Technique that Can Share Healthcare Insights Without Compromising Data Privacy

healthcare security, cloud computing protection, asset protection

By Dattaraj Rao

COVID-19 has brought about a shift in perception of data privacy. In the early months of the pandemic, tech companies scrambled to develop contact-tracing and notification apps that harvest smartphone location data to warn people of their proximity to infected individuals. The US government even announced it would stop enforcing HIPAA penalties and enable freer sharing of patient health information to assist in dealing with the pandemic. Some experts even fear that sacrificing data privacy during the pandemic could have long-term privacy implications. 

Though most of these shifts should be short term, they do highlight the importance of sharing vital healthcare data, or the insights gleaned therein, while maintaining robust protections on patient privacy, especially as we move to a new normal. One potential solution is a recent approach to ML that allows us to benefit from the insights of a variety of healthcare data sources without compromising, or even sharing, that very data. It’s called federated machine learning.

A Brief Overview of Federated Machine Learning

In federated machine learning, training data resides on local machines and is not shared at all. The only thing that flows across a network are updates to the ML model which are aggregated with insights across various datasets into a central model. This type of hub-and-spoke configuration keeps local training data isolated and safe while still allowing the insights from models trained on these different data sources to be averaged into the best possible ML model. This aggregation is also secured by cryptography so that individual updates are not revealed. 

How Can This Help in a Healthcare Scenario?

Healthcare providers handle extremely sensitive patient data that needs to be kept private. Regulations like HIPAA and HITECH ensure that personally identifiable patient records do not get misused even with COVID-19 lax enforcement policies. At the same time, this patient data is extremely valuable and contains insights that can help care providers diagnose and prevent critical illnesses – particularly in crisis situations like the COVID-19 outbreak. 

Each hospital has COVID-19 patient data such as medical records, X-rays scans, clinical notes of treatments, procedures, diagnosis, and results. ML models built using these valuable datasets can provide predictive and prescriptive analytics for this disease, however, each hospital will have only a fraction of the complete dataset. The models developed by each hospital will attempt to learn the patterns observed, but they will be limited by the size as well as the quality of the dataset fraction available. These fractions cannot be centralized due to their sensitive nature, however, the valuable insights and patterns from aggregating models trained on these factions can be uncovered using federated learning.

Federated learning enables hospitals to maintain patient data privacy while sharing insights obtained from their dataset in the form of model updates with other hospitals. This is a win-win situation for all parties. The hospital with COVID-19 cases does not share any data, but it does share insights that get federated securely to improve the prediction performance of all models. Each hospital sharing its knowledge will also eventually benefit from having access to contextually different information in a “wisdom of the crowd” setting. This concept is shown in the figure below.

Maturity Model for Adopting Federated Learning

Federated learning is still an emerging technology with huge potential benefits. Organizations need to evaluate a strategy to transform their current AI development to adopt this privacy-aware ML approach. We see a maturity model emerging for organizations on the path towards AI transformation as shown in the figure below.

The first step is often starting with traditional ML deployments where models are trained offline by replicating valuable customer data. The next step is to have customized models that train on customer machines and do not expose their data outside. However, these models cannot be updated to new versions since the flow of information goes one way. The best of both worlds approach is to move to a federated architecture that combines the above two approaches.

The Power of Privacy

This novel technique presents new opportunities for healthcare organizations and businesses to work around private and sensitive datasets. Though this is particularly important in healthcare, we are working with our banking and industrial customers as well to define a maturity model on privacy-preserving AI. Use cases range from customizing available models for private datasets to enhancing model performance via facilitating collaboration. Furthermore, we believe we can overlap advanced techniques from the encryption domain to strengthen the deployment of federated learning in enterprise solutions.

Paired with privacy-preserving techniques such as encryption and differential privacy, federated learning presents a promising new way for advancing machine learning solutions. With the role of data privacy in healthcare in flux, federated ML could not only save lives, but it could also save privacy.

Dattaraj Rao, Innovation and R&D Architect at Persistent Systems, is the author of the book “Keras to Kubernetes: The Journey of a Machine Learning Model to Production.” At Persistent Systems, Dattaraj leads the AI Research Lab that explores state-of-the-art algorithms in Computer Vision, Natural Language Understanding, Probabilistic programming, Reinforcement Learning, Explainable AI, etc. and demonstrates applicability in Healthcare, Banking and Industrial domains. Dattaraj has 11 patents in Machine Learning and Computer Vision.


Please enter your comment!
Please enter your name here

5 × 2 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.