Healthcare NLP: The Top Factors Impacting the Industry

26

By David Talby, CTO John Snow Labs

Natural Language Processing (NLP) has taken the healthcare industry by storm, and for good reason. The technology can provide significant value in building patient cohorts and identifying clinical risks by understanding the complexities of clinical terminology. One only needs to look at the past year — NLP applications were used in everything from COVID-19 vaccine development to detecting new strains of the virus and its spread, providing important knowledge to researchers, health and government workers. 

This is particularly impressive when you consider the formal implications the healthcare industry is beholden to. While regulations such as HIPAA can be seen as slowing adoption, they’re only requiring best practices of Responsible AI to be implemented upfront, and healthcare is paving the way. Other highly regulated industries, such as finance, have also found a balance between security and automation when it comes to implementing NLP. 

So, what’s driving the uptick in NLP investments and what can we learn from the industries like healthcare driving this momentum? First, we can try to understand how NLP has evolved over the last year and take note of a few key factors contributing to its growth. The newly released 2021 NLP Industry Survey aims to explore this, laying the foundation for the future of NLP. 

The main use cases and data sources fueling NLP projects are a good starting point. While cutting-edge projects and new applications of the technology get a majority of the attention, more practical, ‘everyday’ uses are at work behind the scenes increasing operational efficiency and improving patient care. Not surprisingly, the main data sources for NLP include text fields in databases, files (PDFs, docx, etc.), and online content — something most organizations can easily get started with. 

The aforementioned data sources are critical building blocks as users start putting NLP to use by way of Named Entity Recognition (NER) and Document Classification — the most popular applications of the technology. NER, cited as the top use case for more than half of tech leaders (54%), refers to locating key entities (company names, products, location, etc.) within text. 

For example, John Snow Labs offers an NER model to healthcare users that works to identify drug entities from a given text, helping practitioners uncover and prevent costly and dangerous adverse drug events in patients. NER models can be configured to track other categories, such as radiology or other clinical areas an organization may want to monitor. While more advanced use cases, such as question and answering and natural language generation will come to fruition, NER and Document Classification will likely remain the top use cases among both NLP novices and experts.

Whether practical or advanced use cases, and regardless of industry, level of expertise, company size, and location one factor was cited as the top priority and challenge to NLP users: accuracy. Both 44% of general respondents and 40% of technical leaders agreed that this was the single most important consideration. When you weigh the potential consequences of inaccurate healthcare AI models in a medical setting, you can understand why. 

As the technology matures and more data becomes available, accuracy will improve, but NLP will never be a ‘set it and forget it’ technology that can be perfected once and then left to its own devices. NLP needs to be constantly monitored and tweaked as time and production environments change, often with input from a technician and a domain expert — especially in fields like healthcare, known for its unique and complex nature. 

Accuracy feeds into another survey finding that is influencing how NLP projects are carried out. While cloud solutions have long provided accessible entry points for tech adoption, NLP use is a bit more complicated. Despite 83% of survey respondents indicating they use one of the big four cloud providers for NLP, difficulty tuning models and cost were two main challenges cited. Difficulty tuning is especially worrisome when you consider how regularly NLP models need to be tweaked, and how NLP projects involve pipelines, where the results from a previous task are used to inform future tasks.

That’s likely why most respondents use a combination of cloud services and NLP libraries to fit their organization’s needs. Users also indicated they use an amalgamation of tools to achieve their NLP goals. For all users, Spark NLP was cited as the most-used NLP library. This is not surprising as a majority of respondents hailed from the healthcare industry, an area in which Spark NLP specializes in. In fact, 54% of healthcare organizations use Spark NLP — a sign of how the technology has accelerated over the last several years. 

While many IT-related projects took a back seat during the global pandemic in 2020, investments in NLP grew — a trend that’s continued into 2021. In fact, a majority of healthcare industry respondents reported budget increases to the tune of 10-30%, reflective of the larger NLP market. It’s still early days for NLP, but with growing budgets, more available tools, and more common uses of the technology, the market is poised for even more significant growth leading into 2022.