AI Chatbot smart digital customer service application concept. Computer or mobile device application using artificial intelligence chat bot automatic reply online message to help customers instantly.

Hastings Center News

Considering the Potential and Pitfalls of “Dr. GPT-3” in a Clinic Near You

Artificial intelligence natural language computer applications are becoming increasingly sophisticated, raising the possibility that they could assume a greater role in health care, including interacting with patients. But before these applications enter the clinic, their potential and pitfalls need thoughtful exploration, states a new article in NPJ Digital Medicine.

The authors are Diane M. Korngiebel, a Hastings Center research scholar, and Sean D. Mooney, chief research information officer at University of Washington Medicine.

“There is compelling promise and serious hype in AI applications that generate natural language” Korngiebel and Mooney write, referring to OpenAI’s Generative Pre-trained Transformer 3 (GPT-3) and similar technologies. The article breaks down potential health care applications into three categories: unrealistic, realistic and feasible, and realistic but challenging.

Unrealistic Applications

Natural language AI applications will not replace doctors, nurses, and other health care personnel in conversation with patients anytime soon. “Interactions with GPT-3 that look (or sound) like interactions with a living, breathing—and empathetic or sympathetic—human being are not,” the authors write. In a recent test of GTP-3 for mental health counseling, for example, the application supported a simulated patient’s expressed thoughts of suicide. In addition, natural language AI applications currently reflect human biases involving gender, race, and religion.

Realistic and Feasible Applications

Natural language applications could relieve health care providers of some routine tedious tasks, such as navigating complex electronic health records. And, given that they are capable of fairly natural-sounding question and answer exchanges, the applications could improve customer service online chat support and help patients with noncritical tasks such as setting up equipment in preparation for a telehealth visit. But there  must be “serious guardrails” for all health care interactions, including training the applications to eliminate “harmful, prejudicial, or inappropriate vocabulary.”

Realistic but Challenging Applications

GTP-3 could be used to assist with triaging noncritical patients who come to emergency departments. However, developers of the technology and people implementing it would need to be mindful of harms. For example, natural language applications that do not “speak” a patient’s language might triage that patient inappropriately. “Implementation should include another means of triaging those patients who cannot, or do not wish to, use the conversational agent, which may also be too linguistically homogenous to offer culturally mindful language use,” the authors write, adding that it is important to maintain a “human in the loop.” A staff member would also need to review all triage forms.

The article concludes with recommendations for making sure that natural language applications are equitable. A broad range of stakeholders should be involved from the earliest stage of development through deployment and evaluation. And there should be transparency, including in the datasets used and limitations of the applications.

“We should have cautious optimism for the potential applications of sophisticated natural language processing applications to improve patient care,” the authors write. “The future is coming. Rather than fear it, we should prepare for it—and prepare to benefit humanity using these applications.”

Read the full text of the article.