By Dr. Mrinal Puranik
There are two global events well underway that may not, on the surface, seem to have any obvious similarities: the digital transformation and COVID-19. Yet, this pandemic is different from all the earlier pandemics not only in the form of its worldwide spread but also in the technological disruptions that it is accelerating and are being used to fight against it. One such technology is machine learning.
Right now, we’re dealing with a bit of a knowledge explosion, and doctors and researchers alike are struggling to keep up with our rapidly changing understanding of COVID-19. There are thousands of research papers on viral pandemics, and, during this pandemic, hundreds more are coming out every day. The COVID-19 Open Research Dataset (CORD-19), for example, integrates papers from multiple sources to host over 200,000 open source articles.
With data on COVID-19 pouring in every day from all over the world, machine learning and recent advances in natural language processing (NLP) are offering novel ways to cut through the noise and offer answers to some of the most important questions surrounding COVID-19, including the where, who, what, and when.
Where Did the Virus Come From?
Machine learning is being applied to identify the origin of the virus by studying its genome. One study that used an in-house machine learning method claims that the virus is most closely related to two bat SARS-like coronaviruses found in Chinese horseshoe bats Rhinolophus sinicus. Using supervised learning and digital signal processing, the authors were able to identify relationships between genome clusters, and, in a remarkably short period of time, trace them back to specific bat species.
Who is vulnerable to COVID-19
So far, research has shown that the risk of severe complications from COVID-19 is higher for people who are elderly, frail, or have multiple chronic conditions. The risk of death increases if one has cancer, high blood pressure, chronic respiratory disease, diabetes and heart disease. However, identifying those most vulnerable is not straightforward. More than 55% of patients meet at least one of these risk criteria. People with the same chronic condition don’t have the same risk. A one size fits all approach doesn’t apply here.
This is a classic problem for machine learning, which can be used to create a vulnerability index to weigh risk. DeCapprio et.al used machine learning to predict a person’s CV19 index as measured in terms of their near-term risk of severe complications from respiratory infections. Three different approaches all using machine learning were implemented by the team to predict the vulnerability index.
What’s the Correct Clinical Course?
The above example is a bit of a pseudo index; it doesn’t make use of patients that have experienced COVID-19 but is based on their vulnerability to similar kinds of viruses. While the importance of predicting risk cannot be understated, it’s altogether more important to know what clinical action needs to be taken once a patient is identified as positive.
Jiang et.al. present an AI-based tool applied to real patient data to provide rapid clinical decision-making support. The authors used decision trees, random forests, and support vector machines for predictive analysis. Interestingly, the features that best predicted ARDS were not the indicators that a clinician would normally select, nor were these values grossly abnormal clinically. Overall, the models were 70%-80% predictive. While the study consisted of a very small data set, nonetheless, it’s an important study that can be built upon in other cases.
When Will Drugs be Developed or Repurposed?
It’s been over half a year since the first infection case, and the world is racing to get the right vaccine/drug to treat the infection. As the development of new drugs and required clinical trials are still a ways off, repurposing (also called as repositioning) existing drugs for treatment is a major focus in the medicinal industry right now.
Knowledge graphs and machine learning offer an interesting way to explore repurposing existing drugs for COVID-19 treatment. In one example, seven different knowledge graphs were constructed to understand interactions among humans, the virus, drugs, and proteins therein from around 20 million PubMed abstracts. This was passed on to a graph convolution algorithm to get virus related feature information and accurately predict potential drug candidates. After a thorough study, the authors have identified a drug for repositioning as a treatment for COVID-19 which is now undergoing a clinical trial, and this could accelerate our timeline for better treatments.
The Future of Machine Learning and Pandemics
The last pandemic before COVID-19 was the H1N1 virus in 2009. COVID-19 is the first pandemic that arose in the fully digital era. With abundant information being made publicly available, advancements in machine learning are accelerating our ability to answer the virus’ most pressing questions. These are just a few of the different applications of machine learning trying to solve the riddle presented by COVID-19, and there are many more that have already been developed and are still in development. As our methods of training machine learning algorithms become increasingly sophisticated, we will not only be able to better understand and treat COVID-19 but also future pandemics.
Dr. Mrinal Puranik is a Domain Expert in the Life Sciences Domain at Persistent Systems, where she has researched image processing, microarrays, and Next Generation Sequencing data analysis over the last 12 years. Dr. Puranik received an M.Sc. in Electronics and an M.Phil. in Image Processing and Pattern Recognition at the University of Pune, as well as a Ph.D. in Electronic Science in VLSI, Neural Network, and Image Processing.