When Judith Miller received the results of a medical imaging study last year, the 77-year-old Wisconsin resident did what many patients nowadays do: she asked AI to explain them. Claude, a large language model (LLM) developed by the company Anthropic, obligingly laid out possible interpretations. With the chatbot’s analysis in hand, Miller went into her follow-up appointment feeling prepared for a productive conversation with her doctor. As she puts it, Claude’s responses “enabled me to better understand my health and engage more fully in shared decision-making.”
This scene has become commonplace in clinics around the country. Two recent polls both found that a third of American adults have turned to LLMs for health information—to make sense of lab results, diagnose symptoms, research treatment options or inquire about prescription drugs. “The use of tools like these has doubled in the past year,” says Robert Wachter, a physician at the University of California, San Francisco. “I suspect they’ll double again next year.”
But these chatbots can also provide misleading or inaccurate advice, so experts urge caution when using them. Anthropic, for its part, agrees. “Claude is not designed or marketed for making clinical diagnoses,” according to a spokesperson for the company. Its proper use is “helping people prepare for conversations with their doctors, not replacing them.”
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
For many patients, AI is a welcome solution to the problems posed by the glut of personal health data supplied by the 21st Century Cures Act, which mandates immediate online access to medical records, such as test results and clinical notes. “If you’ve ever looked at that stuff,” says Dave deBronkart, a health care blogger and activist, “you know it leaves you with the gigantic question: What does all this mean?” Just a few years ago the meaning lay hidden behind a wall of medical jargon that only doctors could comprehend. And because patients can now view results online before speaking with a doctor, they’re often left anxiously wondering what to make of it all. Today, however, general-purpose chatbots and a host of specialized health models can translate the jargon into plain language within seconds, potentially allaying unfounded fears.
Yet they may also heighten anxiety unnecessarily—or worse. LLMs remain prone to mistakes. They can present falsehoods as facts and sycophantically reinforce users’ prior (and sometimes misguided) beliefs. Though these character flaws may lessen as the models grow more powerful, many experts express concern about the potential risks of using today’s AI models in this way. “There aren’t a lot of guardrails around breaking them, pushing them to tell you actual misinformation,” says Cait DesRoches, executive director of OpenNotes, a nonprofit that promotes patients’ access to medical records. She adds that there is little research on what happens when people treat an LLM as a health authority: “I don’t think we have any idea how well it works for average patients.”
Worst-case scenarios have already surfaced. In December a 75-year-old Seattle man died of a treatable type of leukemia; he reportedly refused treatment on the basis of AI-generated evidence that incorrectly suggested he had a rare complication. Some of the preliminary research on how people use AI for medical diagnosis is sobering. In a Nature Medicine study published in February, researchers asked participants to diagnose a hypothetical condition with the help of various LLMs. They reached the right conclusion only about a third of the time.
Still, most experts agree that chatbots can be helpful to people seeking medical information, if used cautiously. “I don’t think people should avoid using them,” DesRoches says, “but I do think people should use them with their eyes open.” Adam Rodman, a general internist at Beth Israel Deaconess Medical Center, goes even further: “I would argue that LLMs, if used appropriately—that’s a big caveat—are the best tool for patient empowerment ever invented.”
Hoping to harness this technology without compromising safety, researchers have developed a suite of strategies to counteract AI’s shortcomings. For example, they suggest telling chatbots to take on the persona of a doctor. This may “prompt the model to collect data in a physicianlike manner,” Rodman says. Other tactics include asking an LLM to rigorously reevaluate its own reasoning and seeking a “second opinion” from a different model. Rodman stresses the importance of removing personal information, such as your name and Social Security number, from any chatbot input to protect your privacy.
Ideally, after all that digital dialogue, patients would wind up with better-informed questions for their doctors. Wachter describes this trend as “generally healthy,” though he sometimes loses valuable time debunking Dr. Chatbot’s faulty advice. “I’ve got 15 minutes for this appointment,” he says, “and I’m going to have to spend the first 10 minutes talking the patient down from what GPT told them to do.”
In many cases LLMs are likely replacing real-life clinical advice altogether, particularly for those who are uninsured or face long wait times to get an appointment. “The access issue is at crisis level,” says Laura Adams, a senior adviser to the National Academy of Medicine on AI matters. Despite the technology’s limitations, she argues we must compare it not to perfection but to reality, in which the alternative may be no care at all. “It’s better than nothing,” she says.
With AI and medical advice, Adams notes that “the horse is way out of the barn.” As more people lean on chatbots to manage their health, researchers and patient advocates say this moment demands a new form of AI literacy. “The remedy is not to keep people ignorant,” deBronkart says. “It’s to teach them how to do it better” by educating children and adults alike. On top of that, newer LLMs will likely improve in medical uses—Wachter suggests that some models might eventually undergo board certification, as actual physicians do.
For now, people like Miller are already approaching AI just as DesRoches recommends: with eyes open, aware of its tendency to hallucinate and confirm user biases. Sophisticated as chatbots’ responses may be, they are stitched together from statistical patterns in large datasets—an impressive trick but one that still falls short of the breadth and reliability in human-level clinical reasoning. “It’s just following up words that were probable,” Miller says. “I’m not looking at it as a source of absolute truth.”
