Skip to main content
Journal Club5 min read

111 Minutes Earlier: What an Emergency-Admission Model Actually Buys You

A Dutch model flags who will be admitted from the emergency department a median 111 minutes before the clinician commits. The number holds. What it measures — decision time, on records, never a patient — is narrower than it sounds.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of an emergency physician walking past curtained bays, partly hidden by a teal clock-face with erased hands, a halftone bar chart below and a single amber dot on an empty bed.

Every ten minutes, while a patient sits in a curtained bay, a model recalculates the odds that this person will be admitted. The moment those odds cross fifty percent, it commits — a median of twenty minutes after triage. The clinician responsible for the same patient takes a median of 151 minutes to reach the same conclusion. The 111-minute gap between the two is the headline of a retrospective study from the St. Antonius Hospital in the Netherlands, published in JMIR AI in January 2026. It is a real number, drawn from a large and carefully assembled dataset. It also means less than it first appears.

Start with the problem the number is meant to solve, because it is not the one most headlines imply. In a saturated emergency department the hard part is rarely deciding to admit. It is the wait — for the bloods, the imaging, the specialist opinion that makes the admission defensible — while the bay stays occupied and the next ambulance idles outside. An early, reliable signal that a given patient is ward-bound could in principle launch the bed search before the corridor backs up. That is a logistics question, not a diagnostic one, and the distinction governs everything that follows.

The model, and what it was scored on

The classifier is an Extreme Gradient Boosting model (XGBoost) — a standard, study-built tool, not a commercial product. It was developed and tested on 131,250 visits from January 2018 to May 2022, then evaluated on a further 23,097 visits through September 2023, for 154,347 in total across five and a half years. On the held-out test set it reached an accuracy of 0.81, precision 0.78, recall 0.73, an F1-score of 0.75, and an area under the receiver-operating curve (AUROC, a measure of how well a model separates the two outcomes across all thresholds) of 0.89. Its strongest predictors were laboratory orders — inflammation markers, kidney function, blood count, cultures — the first-hour bloods any emergency physician requests as a matter of course.

Two design choices earn the authors credit. The dataset is large and longitudinal, not a boutique sample. And rather than resting on a single discrimination figure, they asked what the prediction would be worth in time, and then disaggregated that worth by age, by specialty and by triage category. That breakdown is where the careful reading lives — and where the headline starts to fray.

Where the early signal is genuinely early

Among true-positive predictions, the model's call landed a median of 111 minutes (interquartile range 59 to 169) ahead of the documented clinical decision. The gain was largest exactly where one would hope. In the oldest patients the model was both early and accurate — recall around 0.90 to 0.91 with precision near 0.75 to 0.78 in the 78-to-87 and over-88 groups, the very patients for whom a ward bed secured two hours sooner is most likely to matter. In the single highest-urgency triage band, where admission is all but certain, precision reached 0.97 and recall 0.99. Where the outcome leans heavily toward admission and the early bloods are informative, the model is genuinely good at saying out loud, sooner, what the department was going to conclude anyway.

Where it is neither early nor accurate

The same table that flatters the model in geriatrics undercuts it elsewhere. In the 18-to-27 age group, precision 0.51 and recall 0.46 leave it barely distinguishable from a coin toss. In neurology, precision was 0.52 — roughly one in three admission flags wrong — and, more tellingly, the time saved was zero: the model's prediction arrived at the same moment as the clinician's, not before it. Cardiology showed the same pattern. A tool that is excellent for the very old and worthless-on-time for whole specialties is not one tool but several, and only some are ready. The aggregate 111 minutes is an average over a distribution that includes plenty of zeros.

The study measured how much sooner a decision could be made — not whether making it sooner helped a single patient.

What no one measured

The primary outcome was, by the authors' own definition, the difference in decision time between model and clinician. It is not a patient outcome. Nobody measured whether starting the bed search 111 minutes earlier shortened a single length of stay, freed a bay sooner, or changed what happened to anyone. An earlier prediction is plausibly useful; it is not the same as a better-treated patient, and a retrospective study that never touched a live workflow cannot tell the two apart. The authors say so directly — real-world time savings, they note, are likely lower than these figures once the model meets an occupied department, staff who are not free to act on a flag, and the imaging, blood-gas and free-text data this model never saw.

There is a second limit that travels badly to Germany. The model learned on the Dutch U0-to-U5 triage scale; German departments run the Manchester Triage System with its five colour-coded priorities, and the categories are not interchangeable. A classifier tuned to one urgency vocabulary cannot be assumed to hold on another without local revalidation — and a system that informs an admission decision would sit squarely within the scope of the Medizinprodukteverordnung (Medical Device Regulation, MDR). The transferable lesson is the modest one: a model like this knows nothing the clinician does not. It integrates the same early information faster, without the cognitive load of running four other patients at once. That is worth having — in the groups where it works, validated locally, and measured against an outcome that reaches the patient rather than the clock.

Source: van der Haas Y, Roskamp W, Chang-Willems LEM, et al. Evaluating an AI Decision Support System for the Emergency Department: Retrospective Study. JMIR AI 2026;5:e80448. A single-centre retrospective study whose primary endpoint was decision time, not any patient outcome; the authors note real-world performance is likely lower than the reported figures.

#Journal Club#Clinical AI#Emergency Medicine#Evidence-Based Medicine#Clinical Decision Support

Keep reading

Editorial collage of a confident stack of clinical document fragments bound by a teal bracket that stops at a closed ward door, with a single amber accent.
Journal Club

Sixty-Five Studies Agree the Models Win. The Ward Hasn't Noticed.

A PRISMA review of 65 studies finds language models consistently beat classical methods at classifying clinical text. The honest reading is narrower: it is a synthesis of single-site accuracy studies that mostly never asked whether the models work at the bedside.

Dr. Sven JungmannCEO
Editorial collage of a clinical summary sheet torn down the middle, one half framed by a teal speech bubble and the other by a navy clipboard, with a single amber dot on the tear line.
Journal Club

Two Readers, One Summary: Who Should Grade Patient-Facing AI?

A small Stanford study had clinicians and parents rate the same AI-written clinical summaries. They disagreed, significantly — and that disagreement, not the scores, is the finding worth keeping.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.