Adam Hart has been a nurse at St. Rose Dominican Hospital in Henderson, Nev., for 14 years. A few years ago, while assigned to help out in the emergency department, he was listening to the ambulance report on a patient who’d just arrived—an elderly woman with dangerously low blood pressure—when a sepsis flag flashed in the hospital’s electronic system.
Sepsis, a life-threatening response to infection, is a major cause of death in U.S. hospitals, and early treatment is critical. The flag prompted the charge nurse to instruct Hart to room the patient immediately, take her vitals and begin intravenous (IV) fluids. It was protocol; in an emergency room, that often means speed.
But when Hart examined the woman, he saw that she had a dialysis catheter below her collarbone. Her kidneys weren’t keeping up. A routine flood of IV fluids, he warned, could overwhelm her system and end up in her lungs. The charge nurse told him to do it anyway because of the sepsis alert generated by the hospital’s artificial-intelligence system. Hart refused.
On supporting science journalism
If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.
[Live event: Life in the Age of AI. Join SciAm for an insightful conversation on the trends and innovations shaping AI in the year ahead. Learn more.]
A physician overheard the escalating conversation and stepped in. Instead of fluids, the doctor ordered dopamine to raise the patient’s blood pressure without adding volume—averting what Hart believed could have led to a life-threatening complication.
What stayed with Hart was the choreography that the AI-generated alert produced. A screen prompted urgency, which a protocol turned into an order; a bedside objection grounded in clinical reasoning landed, at least in the moment, as defiance. No one was acting in bad faith. Still, the tool pushed them to comply when the evidence right in front of them—the patient and her compromised kidneys—demanded the exact opposite. (A hospital spokesperson said that they could not comment on a specific case but that the hospital views AI as “one of the many tools that supports, not supersedes, the expertise and judgment of our care teams.”)
That dynamic is becoming familiar in U.S. health care. Over the past several years hospitals have woven algorithmic models into routine practice. Clinical care often relies on matching a patient’s symptoms against rigid protocols—an environment ideal for automation. For an exhausted workforce, the appeal of handing off routine tasks such as documentation to AI is undeniable.
The technologies already implemented span a spectrum from predictive models that calculate simple risk scores to agentic AI that promises autonomous decision-making—enabling systems to titrate a patient’s oxygen flow or reprioritize an ER triage queue with little human input. A pilot project launched in Utah a few months ago uses chatbot technology with agentic capabilities to renew prescriptions, a move proponents say gives providers more time, although physician associations have opposed the removal of human oversight. Across the country, health systems are using similar tools to flag risks, ambiently listen to visits with patients, generate clinical notes, monitor patients via wearable devices, match participants to clinical trials, and even manage the logistics of operating rooms and intensive care unit transfers.
Nurses saw how an imperfect product could become policy—and then become their problem.
The industry is chasing a vision of truly continuous care: a decision-making infrastructure that keeps tabs on patients between appointments by combining what’s in the medical record—laboratory test results, imaging, notes, meds—with population data and with the data people generate on their own by using, for instance, wearables and food logs. It watches for meaningful changes, sends guidance or prompts, and flags cases that need human input. Proponents argue this kind of data-intensive, always-on monitoring is beyond the cognitive scope of any human provider.
Others say clinicians must stay in the loop, using AI not as autopilot but as a tool to help them make sense of vast troves of data. Last year Stanford Medicine rolled out ChatEHR, a tool that allows clinicians to “chat” with a patient’s medical records. One physician shared that the tool found critical information buried in the records of a cancer patient, which helped a team including six pathologists to give a definitive diagnosis. “If that doesn’t prove the value of EHR, I don’t know what does,” they reported.
At the same time, on many hospital floors these digital promises often fracture, according to Anaeze Offodile, chief strategy officer at Memorial Sloan Kettering Cancer Center in New York City. He notes that faulty algorithms, poor implementation and low return on investment have caused some projects to stall. On the ground, nurses, who are tasked with caring for patients, are increasingly wary of unvalidated tools. This friction has moved from the ward into the streets. In the past two years nurses in California and New York City have staged demonstrations to draw attention to unregulated algorithmic tools entering the health-care system, arguing that while hospitals invest in AI the bedside remains dangerously short-staffed.
Sepsis prediction has become a cautionary case. Hospitals across the U.S. widely adopted information health technology company Epic’s sepsis-prediction algorithm. Later evaluations found it substantially less accurate than marketed. Epic says that studies in clinical settings have found its sepsis model improved outcomes and that it has since released a second version it claims performs better. Still, nurses saw how an imperfect product could become policy—and then become their problem.
Burnout, staffing shortages and rising workplace violence are already thinning the nursing workforce, according to a 2024 nursing survey. Those pressures spilled onto the steps of New York City Hall last November, when members of the New York State Nurses Association rallied and then testified before the City Council’s hospitals committee. They argued that some of the city’s biggest private systems are pouring money into executives and AI projects while hospital units remain understaffed and nurses face escalating safety risks. As this story was going to press in mid-January, 15,000 nurses at hospital systems in New York City were on strike, demanding safer staffing levels and workplace protections.
New AI-enabled monitoring models often arrive in hospitals with the same kind of hype that has accompanied AI in other industries. In 2023 UC Davis Health rolled out BioButton in its oncology bone marrow transplant unit, calling it “transformational.” The device, a small, hexagonal silicone sensor worn on a patient’s chest, continuously tracked vital signs such as heart rate, temperature and breathing patterns.
On the floor it frequently generated alerts that were difficult for nurses to interpret. For Melissa Beebe, a registered nurse who has worked at UC Davis Health for 17 years, the pings offered little actionable data. “This is where it became really problematic,” she says. “It was vague.” The notifications flagged changes in vital signs without specifics.
Beebe says she often followed alarms that led nowhere. “I have my own internal alerts—‘something’s wrong with this patient, I want to keep an eye on them’—and then the BioButton would have its own thing going on. It was overdoing it but not really giving great information.”
As a union representative for the California Nurses Association at UC Davis Health, Beebe requested a formal discussion with hospital leadership before the devices were rolled out, as allowed by the union’s contract. “It’s just really hyped: ‘Oh, my gosh, this is going to be so transformative, and aren’t you so lucky to be able to do it?’” she says. She felt that when she and other nurses raised questions, they were seen as resistant to technology. “I’m a WHY nurse. To understand something, I have to know why. Why am I doing it?”
Among the nurses’ concerns were how the device would work on different body types and how quickly they were expected to respond to alerts. Beebe says leadership had few clear answers. Instead nurses were told the device could help with early detection of hemorrhagic strokes, which patients were particularly at risk for on her floor. “But the problem is that heart rate, temperature and respiratory rate, for a stroke, would be some pretty late signs of an issue,” she says. “You’d be kind of dying at that point.” Earlier signs of a hemorrhagic stroke may be difficulty rousing the patient, slurred speech or balance problems. “None of those things are BioButton parameters.”
In the end, UC Davis Health stopped using the BioButtons after piloting the technology for about a year, Beebe says. “What they were finding was that in the patients who were really sick and would benefit from that kind of alert, the nurses were catching it much faster,” she explains. (UC Davis Health said in a statement that it piloted BioButton alongside existing monitors and ultimately chose not to adopt it because its alerts did not offer a clear advantage over current monitoring.)
Beebe argues that clinical judgment, shaped by years of training and experience and informed by subtle sensory cues and signals from technical equipment, cannot be automated. “I can’t tell you how many times I have that feeling, I don’t feel right about this patient. It could be just the way their skin looks or feels to me.” Elven Mitchell, an intensive care nurse of 13 years now at Kaiser Permanente Hospital in Modesto, Calif., echoes that view. “Sometimes you can see a patient and, just looking at them, [know they’re] not doing well. It doesn’t show in the labs, and it doesn’t show on the monitor,” he says. “We have five senses, and computers only get input.”
Clinical care often relies on matching a patient’s symptoms against rigid protocols—an environment ideal for automation.
Algorithms can augment clinical judgment, experts say, but they cannot replace it. “The models will never have access to all of the data that the provider has,” says Ziad Obermeyer, Blue Cross of California Distinguished Associate Professor of Health Policy and Management at the University of California, Berkeley, School of Public Health. The models are mostly analyzing electronic medical records, but not everything is in the digital file. “And that turns out to be a bunch of really important stuff like, How are they answering questions? How are they walking? All these subtle things that physicians and nurses see and understand about patients.”
Mitchell, who also serves on his hospital’s rapid-response team, says his colleagues have trouble trusting the alerts. He estimates that roughly half of the alerts generated by a centralized monitoring team are false positives, yet hospital policy requires bedside staff to evaluate each one, pulling nurses away from patients already flagged as high risk. (Kaiser Permanente said in a statement that its AI monitoring tools are meant to support clinicians, with decisions remaining with care teams, and that the systems are rigorously tested and continuously monitored.)
“Maybe in 50 years it will be more beneficial, but as it stands, it is a trying-to-make-it-work system,” Mitchell says. He wishes there were more regulation in the space because health-care decisions can, in extreme cases, be about life or death.
Across interviews for this article, nurses consistently emphasized that they are not opposed to technology in the hospital. Many said they welcome tools that are carefully validated and demonstrably improve care. What has made them wary, they argue, is the rapid rollout of heavily marketed AI models whose performance in real-world settings falls short of promises. Rolling out unvalidated tools can have lasting consequences. “You are creating mistrust in a generation of clinicians and providers,” warns one expert, who requested anonymity out of concern about professional repercussions.
Concerns extend beyond private vendors. Hospitals themselves are sometimes bypassing safeguards that once governed the introduction of new medical technologies, says Nancy Hagans, nurse and president of the New York State Nurses Association.
The risks are not merely theoretical. Obermeyer, the professor at Berkeley’s School of Public Health, found that some algorithms used in patient care turned out to be racist. “They’re being used to screen about 100 million to 150 million people every year for these kinds of decisions, so it’s very widespread,” he says. “It does bring up the question of why we don’t have a system for catching those things before they are deployed and start affecting all these important decisions,” he adds, comparing the introduction of AI tools in health care to medical drug development. Unlike with drugs, there is no single gatekeeper for AI; hospitals are often left to validate tools on their own.
At the bedside, opacity has consequences: If the alert is hard to explain, the aftermath still belongs to the clinician. If a device performs differently across patients—missing some, overflagging others—the clinician inherits that, too.
Hype surrounding AI has further complicated matters. Over the past couple of years AI-based listening tools that record doctor-patient interactions and generate a clinical note to document the visit spread quickly through health care. Many institutions bought them hoping they’d save clinicians time. Many providers appreciate being freed from taking notes while talking to patients, but emerging evidence suggests the efficiency gains may be modest. Studies have reported time savings ranging from negligible to up to 22 minutes per day. “Everybody rushed in saying these things are magical; they’re gonna save us hours. Those savings did not materialize,” says Nigam Shah, a professor of medicine at Stanford University and chief data scientist for Stanford Health Care. “What’s the return on investment of saving six minutes per day?”
Similar experiences have made some elite institutions wary of relying only on outside companies for algorithmic tools. A few years back Stanford Health Care, Mount Sinai Health System in New York City, and others brought AI development in-house so they could develop their own tools, test tools from vendors, tune them and defend them to clinicians. “It’s a strategic redefinition of health-care AI as an institutional capability rather than a commodity technology we purchase,” Shah says. At Mount Sinai, that shift has meant focusing less on algorithms themselves and more on adoption and trust—trying to create trust with health-care workers and fitting new tools into the workflow.
AI tools also need to say why they’re recommending something and identify the specific signals that triggered the alert, not just present a score. Hospitals need to pay attention to human-machine interactions, says Suchi Saria, John C. Malone Associate Professor of Computer Science at Johns Hopkins University and director of the school’s Machine Learning and Healthcare Lab. AI models, she argues, should function more like well-trained team members. “It’s not gonna work if this new team member is disruptive. People aren’t gonna use it,” Saria says. “If this new member is unintelligible, people aren’t gonna use it.”
Yet many institutions do not consult or co-create with their nurses and other frontline staff when considering or building new AI tools that will be used in patient care. “Happens all the time,” says Stanford’s Shah. He recalls initially staffing his data-science team with doctors, not nurses, until his institution’s chief nursing officer pushed back. He now believes nurses’ perspectives are indispensable. “Ask nurses first, doctors second, and if the doctor and nurse disagree, believe the nurse, because they know what’s really happening,” he says.
To include more staff members in the process of developing AI tools, some institutions have implemented a bottom-up approach in addition to a top-down approach. “Many of the best ideas come from people closest to the work, so we created a process where anyone in the company can submit an idea,” says Robbie Freeman, a former bedside nurse and now chief digital transformation officer at Mount Sinai. A wound-care nurse had the great idea to build an AI tool to predict which patients are likely to develop bedsores. The program has a high adoption rate, Freeman says, partly because that nurse is enthusiastically training her peers.
Freeman says the goal is not to replace clinical judgment but to build tools clinicians will use—tools that can explain themselves. In the version nurses want, the alert is an invitation to look closer, not an untrustworthy digital manager.
The next frontier arrived at Mount Sinai’s cardiac-catheterization lab last year with a new agentic AI system called Sofiya. Instead of nurses calling patients ahead of a stenting procedure to provide instructions and answer questions, Sofiya now gives them a ring. The AI agent, designed with a “soft-spoken, calming” voice and depicted as a female model in scrubs on life-size promotional cutouts, saved Mount Sinai more than 200 nursing hours in five months, according to Annapoorna Kini, director of the cath lab. But some nurses aren’t onboard with Sofiya. Last November, at a New York City Council meeting, Denash Forbes, a nurse at Mount Sinai for 37 years, testified that Sofiya’s work must still be checked by nurses to ensure accuracy.
Even Freeman admits there is a “ways to go” until this agentic AI will provide an integrated and seamless experience. Or maybe it will join the ranks of failed AI pilots. As the industry chases the efficiency of autonomous agents, we need an algorithm-testing infrastructure. For now the safety of the patient remains anchored in the very thing AI cannot replicate: the intuition of the human clinician. Like in the case of Adam Hart, who rejected a digital verdict in order to protect a patient’s lungs, the ultimate value of the nurse in the age of AI may be not their ability to follow the prompt but their willingness to override it.
