Female african american doctor talks to young hispanic patient while looking at computer
Contents

From Bioethics Briefings

Generative AI in Healthcare

Highlights
  • The use of medical AI and large language models has become part of our healthcare system, raising the prospect of benefits, such as increased access to care, as well as ethical concerns.
  • Ethical concerns include risks to patients’ privacy from data collected by AI, a lack of understanding by patients about the use of AI in their care, and bias against underrepresented groups.
  • There are safety concerns when AI diagnoses or advice are inaccurate and a lack of accountability about who is responsible. Research is needed not only on the accuracy and efficiency of AI tools in healthcare, but also on how these tools reshape the patient-clinician relationship and affect health outcomes across diverse populations.
  • Regulatory frameworks are needed; they should be sophisticated enough to govern rapidly changing technologies while remaining flexible enough to permit beneficial innovation.

Framing the Issue:

Generative AI tools in healthcare are increasingly taking over functions that were once squarely human, from drafting clinical notes to answering patient’s questions and even assisting with diagnoses. These new tools can introduce ethical risks that are common to AI systems: bias against underrepresented groups, safety risks when outputs are inaccurate or overconfident, and accountability gaps when automation blurs who is responsible for these errors. Some of these risks may diminish over time—datasets may become more representative, safeguards may mature, and model performance may improve. But even with technical progress, there are some areas of care that should remain human, either because providers need to preserve their relational or diagnostic skills or because patients need things only humans can provide. Understanding these human contributions is vital to developing AI that improves healthcare and doesn’t just make it more efficient. The responsible use of AI in healthcare requires a positive vision of its future—one that defines the aims and limits of automation, rather than leaving them to efficiency and opportunity.

Ambient AI for Clinical Notetaking

Among the most prominent emerging uses of AI in healthcare is ambient clinical documentation—AI systems that passively record and transcribe provider-patient conversations to automate medical notetaking. Health systems are adopting these tools at scale, with promises of reducing clinician burden, improving efficiency, and strengthening patient interactions.Yet the speed and scope of deployment raise profound and complex ethical considerations that healthcare organizations should navigate carefully. Some of these considerations are privacy, consent, and data governance; accuracy and clinical accountability; bias and health equity; and professional autonomy and de-skilling.

Privacy, consent, and data governance. The most fundamental ethical concern for ambient AI involves patient privacy and informed consent. The use of ambient AI systems allows for continuous recording and processing of conversations between patients and providers, capturing not only medical information but also personal details, family dynamics, and intimate health concerns. Research on current consent or disclosure practices suggests that patients may not receive adequate information about ambient AI before it is used, including details about data storage or speech analysis, limiting their ability to give truly informed consent. Additionally, ambient AI systems generate vast amounts of healthcare data that may be valuable for research, quality improvement, or commercial purposes. However, the ethical use of this data requires robust local governance frameworks by individual healthcare systems that protect patient interests while enabling beneficial secondary uses. Patients must understand not only how their immediate care documentation is being generated, but also how their data might be used for broader purposes beyond their individual treatment. If patients do not fully understand the scope of data capture and secondary uses of their conversations, then they can’t give meaningful informed consent.

Accuracy and accountability. Inaccuracies in AI-generated documentation pose risks to patient safety and could expose healthcare providers to liability. While these systems can reduce administrative burden, they may misinterpret context, miss critical nuances, or generate plausible-sounding but incorrect clinical narratives. This creates a challenging dynamic where  providers must remain vigilant about AI-generated content and avoid an overreliance on automated systems. The question of accountability becomes murky when errors occur. It is unclear whether the provider is responsible for failing to catch AI mistakes or if the healthcare system or vendor bears responsibility for flaws in the technology.

Bias and health equity. AI systems inevitably inherit existing biases present in their training data. Ambient AI may systematically misinterpret speech patterns, cultural expressions, or communication styles that differ from the dominant populations in training data, such as non-American English accents or speech impediments. This could lead to disparate documentation quality across different patient demographics, potentially perpetuating or amplifying existing healthcare disparities. Patients from minoritized communities may find their concerns inadequately captured or mischaracterized in ways that affect their ongoing care, which may also exacerbate mistrust in the healthcare system.

Professional autonomy and de-skilling. The healthcare system’s integration of ambient AI raises concerns about the erosion of clinical skills and professional autonomy. As providers become accustomed to AI-generated documentation, they may lose proficiency in clinical observation and active listening with patients—skills that play a key role in quality patient care. In addition, these AI systems may eventually influence clinical thinking by suggesting certain diagnostic pathways or treatment approaches within the AI-generated documentation process. While most evaluations of ambient documentation focus on metrics of efficiency, healthcare systems should also measure the impacts of ambient AI on patient outcomes. 

AI-Drafted Patient Messaging and Medical Chatbots

AI-drafted patient messaging systems use large language models (LLMs) to generate responses to patient inquiries through electronic health portals like Epic. Increasingly, patients may also consult LLM chatbots directly—whether through health system-mediated tools (e.g., triage chatbots embedded in patient portals) or widely available consumer systems, such as ChatGPT. Both technologies introduce tensions between timeless principles of medical communication—honesty, empathy, and trust—and the new capabilities of digital health technologies. Unlike messaging in health portals, where clinicians still function as intermediaries and bear ultimate responsibility, direct chatbot interactions may bypass clinicians altogether, raising new questions about accountability, patient safety, and the boundaries of medical practice. Some other ethical considerations include empathy, authenticity, and trust in the patient-clinician relationship; clinical responsibility and oversight; quality of care and patient safety; and regulatory and professional standards.

Empathy, authenticity, and trust in the patient-clinician relationship. Effective healthcare rests on trust between patients and clinicians, which may be challenged when AI systems generate patient communications without clear disclosure or sufficient clinician overview. Patients expect messages from their healthcare team to represent genuine human judgment. When AI drafts responses that providers then send under their own names, this could be a form of deception that risks undermining the therapeutic relationship. At the same time, patients may accept or prefer AI-drafted messages, especially if they lead to timely and efficient communication with their care teams. Direct use of chatbots may also lead patients to attribute humanlike qualities to chatbot responses, developing a false sense of relational trust.

The question of empathy in AI-generated messages presents a particularly complex ethical challenge. While AI systems can be programmed to use empathetic language patterns and respond to emotional cues in patient communications, this raises fundamental questions about the nature of empathy itself.

True empathy involves not just appropriate language but also genuine understanding, shared emotional experience, and authentic concern for another’s well-being. AI systems may produce messages that appear more consistently empathetic than those from overworked clinicians, using carefully crafted language that acknowledges patient concerns and validates their experiences. However, this presents a potential empathy paradox: AI-generated messages may feel more empathetic to patients while being fundamentally devoid of genuine emotional understanding. If patients feel cared for, does authenticity matter less than affect? Or does simulating compassion risk empathy washing—using technology to create the appearance of concern while displacing real human emotional labor? These questions force healthcare systems to grapple with whether artificial empathy should be treated as a pragmatic solution to clinician burnout or as a distortion of what makes the patient-clinician relationship meaningful. Moreover, if patients increasingly turn to chatbots for reassurance before contacting a clinician, this may subtly shift expectations of what counts as a “trusted” medical voice.

Clinical responsibility and oversight. AI-drafted messages complicate traditional accountability because clinicians remain legally responsible for all communications under their name, yet automation bias may encourage overreliance on AI outputs. In contrast, interactions with health AI chatbots for consumers may bypass clinician oversight entirely, creating unclear lines of responsibility if consumers act on faulty or incomplete advice. For instance, if someone receives unsafe reassurance from a consumer health AI chatbot and delays urgent care, who is accountable—the technology vendor or the patient?

These scenarios demand new models of oversight, including guardrails for approved chatbot use, clear disclaimers about limitations, and built-in pathways that ensure that high-risk symptoms are flagged for human review. Without such safeguards, AI systems risk introducing new harms in our health information and communication channels. 

Quality of care and patient safety. Perhaps most critically, those who create AI-drafted messaging systems need to ensure that new efficiencies don’t compromise the quality or safety of patient care. Consumer chatbots add a layer of risk by removing expert content mediation from clinicians. AI systems may generate responses that appear helpful but contain subtle medical inaccuracies, inappropriate reassurances about concerning symptoms, or delays in recommending necessary follow-up care. Patients may become accustomed to the convenience of quick, automated responses  and decide not to seek appropriate medical attention. 

Regulatory and professional standards. Finally, both AI-drafted messaging and medical chatbots sit within regulatory frameworks that are not designed for these technologies. Professional organizations and health systems must establish clear guidelines for ethical AI use, including disclosure obligations, requirements for human review, and boundaries around clinical decision-making. With consumer-facing chatbots, additional standards are needed to define when these tools cross into the practice of medicine, what liability vendors bear, and how to ensure consistent quality. These standards should emphasize that AI assistance complements, but does not replace, careful clinician judgment.

Chatbots for Mental Health

Teenagers and adults are increasingly using LLM chatbots for psychological support—ranging from advice and companionship from recreational chatbots to platforms dedicated to therapy. Platforms dedicated to therapy present themselves as evidence-based mental health tools built on cognitive behavioral therapy, while recreational platforms are often used as quasi-therapeutic companions. Chatbots for mental health share many of the same ethical concerns as ambient AI and medical chatbots: privacy, consent, accountability, and bias. But they  raise additional questions: Is increasing access to mental healthcare with chatbots beneficial? Are they safe and effective?                            

Access to care. The great hope for mental health chatbots is that they will expand access to care. Surveys suggest that half of Americans with mental health issues do not receive care and that the most significant barrier is affordability. Chatbots could increase access to care in a variety of ways: They are low-cost, available at any time, and can be accessed in many languages. In underserved areas, they might even be the only source of care available. Chatbots may also reduce psychological barriers to accessing mental health care; users may feel less judgment and stigma in revealing sensitive issues to a chatbot than to a human therapist.

Safety and effectiveness. Despite this promise, there are too few rigorous studies of chatbots’ safety and effectiveness in mental health support to know whether increased access will ultimately be beneficial. And even fairly recent studies are hard to evaluate given the rapid development of the technology. However, many of these studies, including a recent randomized controlled trial of a dedicated therapy chatbot, have found that both recreational and dedicated therapy chatbots reduced symptoms of anxiety, depression, and loneliness. Whether chatbots reduce these symptoms or exacerbate them, however, may depend on how often they’re used. For some people, chatbot therapy may be too accessible.

Perhaps a more significant concern is about how chatbots handle crises such as suicidal thoughts or threats of violence. While there is some evidence that chatbots can reduce suicidal ideation, there are many examples of chatbots failing to recognize the threat or actively promoting it. Chatbot developers are starting to add guardrails to avoid providing harmful suggestions or to recommend helpful resources, but even a small error rate in this area may be unacceptably harmful.

More generally, there is reason to worry that chatbots cannot produce the right bond between therapist and patient—the therapeutic alliance. Some people may find it difficult to develop this bond of trust and shared understanding with a chatbot, given that it can only simulate empathy and vulnerability. Others may develop the wrong kind of bond for therapy. LLMs have a tendency toward sycophancy, so when they are designed to better engage patients, patients may form a relationship that only affirms their delusional thinking. Or the bond may be a result of anthropomorphizing the chatbot, which would be a relationship based on deception.

The present and future. More research and regulation are needed. We need more randomized controlled trials on the safety and effectiveness of the latest recreational and dedicated therapy chatbots. The testing should include red teaming safety risks and understanding the limits to appropriate use, which would inform system guardrails, regulation, and auditing. Until we have a better understanding of LLMs’ strengths and limitations as therapists, many argue, we should use them only as a supplement to human therapy and only for less vulnerable patients. Given the already widespread use of chatbots as informal therapists, however, it will be difficult to restrict their use without more research into their potential harmful effects.                          

In the future, chatbots may not replace human therapy but instead provide a distinct kind of therapy. As LLMs are increasingly networked into other sources of data, especially from wearables or implants, they may have access to users’ actions, facial cues, biometrics, or neural activity. Those patterns may give therapy chatbots a more “objective” understanding of our patterns, as well as the ability to monitor the effects of interventions. Aggregating all this data on our responses and behavior will, of course, be a significant threat to privacy. But it also presents a deeper concern. Does such comprehensive self-surveillance represent a genuine form of self-knowledge? Or will it instead threaten the kinds of narratives needed for healthy self-formation?      

LLMs as a Diagnostic Tool

Recent studies have made news by demonstrating that LLMs are as accurate as human clinicians in diagnosing patients from case reports—and maybe better. One doctor compared this advance to the moment when Deep Blue beat Kasparov at chess. But it’s not clear that we are at the point where AI will outperform humans at medical diagnoses. For one thing, LLMs still make characteristic errors. They underperform on rare-disease diagnosis compared with non-AI decision-support tools. LLM outputs are also prompt-sensitive; the same facts can yield different diagnoses when you change instructions, so consistency depends on how a case is framed. And when their predictions are uncertain, they do not typically reveal their level of uncertainty and are fairly inaccurate when asked to do so.

Training human doctors to use these AI tools well may be able to improve upon both doctors and LLMs alone. That is, in fact, already happening: 40% of U.S. physicians currently use an LLM called Open Evidence for literature reviews and diagnostic support. Open Evidence was fine-tuned on reputable medical journals, producing much more accurate diagnoses than general-purpose LLMs.

There are, however, several concerns with using AI as a diagnostic assistant. People often defer to AI-generated conclusions, a tendency called automation bias that medical trainees will be especially susceptible to. And as diagnoses become increasingly a product of AI, rather than human judgment, the data that will train future AIs will reproduce the characteristic errors of LLMs. Doctors may also be less able to catch these errors if they start to lose their diagnostic skill because of an overreliance on AI—a de-skilling process that can start soon after the introduction of AI tools.  

If diagnostic LLMs continue to increase their accuracy over time, however, some have wondered whether we should even worry about de-skilling. As Dhruv Khullar points out, “In the past, doctors were probably better at listening to heart murmurs, or at feeling the liver. Now we have echocardiograms and CT scans. We’re less good at those old skills, and I don’t think people feel like that’s a huge loss.” However, as Khullar also notes, LLMs are more accurate at diagnoses only when analyzing information that has already been collected and organized by doctors. This skill of identifying, prioritizing, and presenting the relevant information—the skill at intake and prompting—remains the province of humans and requires many of the same skills as diagnosis. So, any loss of human diagnostic skill is likely to bring AI down with it. There are also psychosocial contributions to illness that are not fully captured by health records and, therefore, may require a human understanding. These concerns reflect the current state of the technology, not necessarily its limits. Data collected through wearable devices may one day capture aspects of psychosocial context, and future LLM systems may become less dependent on the framing and organization of human prompts.

For now, however, as doctors increasingly rely on AI, they will have to find ways to retain their diagnostic skill. Part of that responsibility falls on doctors to use AI as a co-reasoner—rather than as a recommendation system—to find aspects that they may have overlooked. This includes using it to review the latest literature, broaden the list of possible causes, identify alternative lines of reasoning, and find holes in the doctors’ reasoning. However, given that AI tools provide recommendations anyway, it will be tempting for doctors to skip straight to the answer. So, another part of the responsibility lies with medical education and AI design. Medical education should emphasize the use of AI tools as a second opinion, while AI tools should be designed to walk doctors through the reasoning process before they arrive at a conclusion together.  

Looking Ahead

As generative AI becomes embedded across healthcare domains—from documentation and diagnostics to communication and therapy—it challenges longstanding boundaries between human and machine roles in care. As these technologies evolve from experimental tools to standard practice, healthcare systems face a critical window to establish ethical frameworks that can keep pace with innovation. This requires moving beyond reactive risk management toward proactive governance structures that balance the genuine benefits of AI with the fundamental values that define good healthcare: trust, empathy, safety, and equity. The questions raised in this chapter are not merely technical challenges to be solved but also reflect deeper tensions about what we want healthcare to be in the age of artificial intelligence.

Looking ahead, the central task for bioethics will be to develop frameworks that preserve moral and relational dimensions of care amid increasing automation. This will require collaboration among clinicians, patients, ethicists, policymakers, and technology developers. We call for rigorous research, not only on the accuracy and efficiency of these tools, but also on how they reshape the patient-clinician relationship and affect health outcomes across diverse populations. We need regulatory frameworks sophisticated enough to govern rapidly changing technologies while remaining flexible enough to permit beneficial innovation. And we need public dialogue about which aspects of healthcare should remain fundamentally human, even when machines can perform certain tasks more efficiently. The decisions we make now about appropriate use cases and accountability structures will shape healthcare for generations to come. Getting these decisions right requires both urgency and humility: urgency because deployment is already happening at scale and humility because we are still learning how we want to live alongside artificial intelligence.

Athmeya Jayaram, PhD, is an Assistant Professor of Philosophy at John Jay College, City University of New York.  

Kellie Owens, PhD, is an Assistant Professor in Medical Ethics at New York University Grossman School of Medicine.