A UCSF study has found that a ChatGPT-4 LLM can prioritize Emergency Department patients for treatment with 89% accuracy
Healthcare organizations looking for help prioritizing patients in the Emergency Department could benefit from an AI tool developed by researchers at the University of California San Francisco (UCSF).
Researchers tested the ChatGPT-4 large language model (LLM) on 10,000 sets of patients seen at the UCSF ED between 2012 and 2023, and found that the tool accurately assessed clinical acuity in 89% of the cases. A subset of 500 cases evaluated by a clinician as well as AI found that AI outperformed the clinician, 88% to 86%.
The study, which appears this week in JAMA, could give health systems a valuable tool for triaging ED patients, particularly during times of heavy traffic or staff shortages. By assessing severity more quickly, the hospital could direct ED staff to those patients in need of emergency care and speed up time to treatment, eventually improving clinical outcomes.
“Imagine two patients who need to be transported to the hospital but there is only one ambulance, or a physician is on call and there are three people paging her at the same time, and she has to determine who to respond to first,” Christopher Williams, MB, BCHir, a UCSF postdoctoral scholar at the Baker Institute and lead author of the study, said in a UCSF press release.
Using data from more than 250,000 ED visits, Williams and his colleagues used an AI model to extract data from clinical notes and determine the severity of the injury. They then compared that analysis to the patient’s score on the Emergency Severity Index (ESI), which rates patients on a scale of 1-5 and is used by ED nurses to prioritize care delivery.
The ESI “uses an algorithm to categorize patients arriving at the ED, estimating the severity of their condition and anticipated future resource use,” Williams and his colleagues said in the study. “The ESI is assigned based on a combination of initial vital sign assessments, the patient’s presenting symptoms, and the clinical judgment of the triage clinician, who is often a trained registered nurse. By capturing clinical acuity at triage, the ESI can be used as a surrogate marker to evaluate, at scale, whether LLMs can correctly assess the severity of a patient’s condition on presentation to the ED. This can be achieved by providing the LLM with patient clinical histories documented in ED physician notes, prompting the model to compare histories to determine which patient has the higher acuity, and evaluating the model output against the ground truth as determined by ESI score.”
While proving the value of the AI tool, Williams pointed out that the technology shouldn’t be introduced to an ED just yet. An incorrect assessment could cause delays in treatment that could harm the patient or even lead to death. In addition, AI tools could reflect biases caused by the data used to train the model, further expanding care gaps for underserved populations.
“It’s great to show that AI can do cool stuff, but it’s most important to consider who is being helped and who is being hindered by this technology,” William said in the press release, while calling for more clinical trials and research. “Is just being able to do something the bar for using AI, or is it being able to do something well, for all types of patients?”
Eric Wicklund is the associate content manager and senior editor for Innovation at HealthLeaders.
KEY TAKEAWAYS
The UCSF study compared two sets of 10,000 ED visits over the past decade, and found that an AI tool accurately assessed 89% of the time which of the two should be treated first.
The technology could help health systems and hospitals triage ED patients during high volume periods or staff shortages.
Researchers cautioned that the technology needs more testing and validation, as an incorrect or biased assessment could cause patient harm or even death.