How Health Systems Should Approach Generative AI

Updated on April 28, 2023
AI text with businessman on dark vintage background

Since ChatGPT burst onto the scene last year, interest in easy-to-use, compelling generative AI programs has exploded. At the same time, the viral technology has provoked much debate about when, how, and where it is appropriate to use. 

For its part, OpenAI recently announced the launch of publicly-accessible APIs to allow any business to integrate ChatGPT into its products and services. Meanwhile, Microsoft and Google seem eager to introduce generative AI into more and more of their consumer-facing businesses.  And the latest version, GPT-4, has been shown to pass challenging tests like the bar exam and even the US Medical Licensing Exam.

So, are ChatGPT and related technologies ready for primetime use in healthcare? Many health system executives rightfully want to understand how the game-changing technology can be put to work. It’s not hard to imagine how these systems might help tackle challenges like documenting patient encounters, reducing administrative burdens, and even aiding clinical decision-making. 

But before making any quick decisions, health leaders need to understand the limitations of large language models like ChatGPT and note the ethical, regulatory, and consumer safety considerations that come with them. After all, health systems are not just any business. They routinely manage sensitive information about patients, who trust the information that health systems provide. As a result, their technology use must be held to a higher standard and requires greater caution when implementing generative AI and related technologies.   

Convincingly Incorrect Answers & Hallucinations 

AI models like ChatGPT are exceptionally good at completing sentences to form compelling, reasonable responses. AI-generated responses can look like they were written by an expert with knowledge of the topic. However, even with well-designed prompts, large language models are prone to providing plausible-sounding but incorrect or nonsensical answers, otherwise known as “hallucinations.”

When Google publicly demonstrated its new AI chatbot, for example, a factual error generated by the model famously wiped off over $100 billion from the technology company’s market cap. Likewise, Microsoft recently acknowledged its hallucination problem but suggested that the wrong answers provided by its tools are “usefully wrong.” 

Besides their tendency to hallucinate, ChatGPT-like AI models face a number of additional limitations. They are only as good as the data on which they are trained, and they are sensitive to input phrasing, which can lead to variations in responses. There are also concerns over how data provided to these systems is used. Some organizations like JPMorgan Chase, Verizon, and Northrop Grumman have even banned the tool to prevent the unauthorized spread of confidential information. Likewise, Italy has temporarily banned the tool over privacy concerns while regulators in the European Union look to shore up data collection and retention policies, given concerns that protected information could be used to train AI models like ChatGPT. It’s worth noting that legal opinions vary on whether ChatGPT is generally HIPAA compliant, though Microsoft recently made it available through its Azure OpenAI Service, which would allow healthcare organizations to enter into a HIPAA Business Associate Agreement.

A Cautious Approach is Key in Healthcare

Patient trust is fundamental in healthcare. Consumers must be able to trust the information their healthcare organizations provide. Additionally, they should be able to trust their data is stored confidentially and given every protection. 

Built to address these concerns, AI solutions designed for healthcare differ in their approach from large language models. For example, they tend to be more task-specific and not as open-ended. Additionally, the databases they are connected to are more limited, and chatbot responses are more readily apparent when they are wrong. 

To illustrate, let’s imagine a consumer needs to find a cardiologist in their area. A system based on a large language model may be able to recognize the request, but it may not understand the differences between types of doctors and the services they provide. Additionally, the information the system draws from may be outdated or incomplete. As a result, despite confident wording, the system might generate a response containing factually incorrect information (such as offering results that include a physician who has long since moved away or a provider with a different specialty).

By comparison, when a purpose-built chatbot for healthcare is connected to a health system’s database of providers and services, it can provide more accurate responses to a request like finding a provider. Additionally, when integrated with a health system’s EHR system, the chatbot can perform additional tasks like appointment scheduling or collecting intake information before a visit. And when faced with a request that it doesn’t understand, an AI platform built for healthcare is more likely to acknowledge its limitations versus generating a plausible-sounding but ultimately incorrect answer. 

At the end of the day, ChatGPT is impressive thanks to its user-friendly interface that responds with impressive grammar and syntax. The sheer size of its model, the number of parameters it includes, and the amount of data it understands represent a feat of deep learning engineering. It’s not hard to see how generative AI technology could significantly benefit healthcare organizations in the future. 

At this early stage, though, health systems would do well to be cautious when adopting tools like ChatGPT. Health systems implementing generative AI must strike the right balance between utility and safety, carefully considering compliance, accuracy, and reliability. 

Matt Cohen Headshot
Matt Cohen

Matt Cohen, Director of AI at Loyal, is passionate about improving the healthcare experience through intelligent software. Before Matt joined Loyal, he spent several years performing research in areas that include machine learning, speech, signal, and audio signal processing at MIT Lincoln Laboratory and the University of Maryland, College Park. He was initially hired as a Software Engineer, Applied Machine Learning, at Loyal. As the Director of AI, Matt oversees the company’s machine-learning strategy and the AI team.