Journal Club22 May 20265 min read

Asking the Ward Before the AI Arrives

Most hospitals evaluate a clinical AI after they switch it on. This qualitative study did the rarer thing: it sat down with the 14 people who would use the system and asked what they expected — and the worries were as telling as the hopes.

Dr. Sven Jungmann

CEO

Editorial collage of three hospital staff in conversation beside an empty teal panel standing in for a not-yet-installed system, with a single amber accent.

Fourteen people, three conversations, and 189 fragments of speech that the authors later distilled into 61 distinct ideas. That is the whole evidentiary mass of this study — and it was gathered before the software in question had touched a single patient. The fragments are not outcomes. They are what an experienced ward expects, and fears, when it is told an algorithm is coming to do a job it has always done by feel.

The job is ordering blood for surgery. A surgeon in the study described the habit anyone who has worked a theatre will recognise: requesting three units of packed red cells as a matter of course, then transfusing far fewer, or none. The reflex is safe and wasteful at the same time. The tool, a machine-learning system called pMSBOS-TS, is meant to replace the standing rule of thumb with a per-patient prediction of how much blood a given thoracic-surgery case will actually need. Rather than measure whether it works, the authors — writing in the Journal of Medical Internet Research in January 2026 — did something most implementations skip: they asked the staff first.

The shape of the evidence

This is qualitative work, and the design deserves to be stated precisely, because it bounds what the study can and cannot say. Fourteen professionals took part — five physicians, six nurses, three blood-bank staff, with a mean of roughly thirteen years in transfusion work — across three focus-group sessions held between late 2023 and mid-2024, the first a pilot to sharpen the questions. Transcripts were coded with SEIPS 101 (a simplified Systems Engineering Initiative for Patient Safety), which reads a clinical setting through its people, environment, tools and tasks rather than treating the software in isolation. The coding funnel ran from 189 semantic units down to 61 core ideas, 18 subdomains and 7 domains. Findings were returned to participants by email and reviewed by three outside clinicians, a guard against the analysts hearing only what they expected. There is no control group and no patient outcome anywhere in the paper. The unit of evidence is anticipation.

Where the staff agreed with the premise

They were not hostile to the tool; they saw its logic. Blood that used to be ordered on educated guesswork would instead be ordered on what the model had learned from data, and the gain they named was less unwarranted variation — fewer reflexive over-orders, steadier demand for the blood bank to plan against, less waste. That endorsement came with conditions: the predictions had to be reliable, the response fast, and the interface had to live inside the electronic record they already used rather than as one more separate screen to consult.

Three worries worth naming

The concerns clustered, and the clustering is the useful part. The first and most frequent was verification burden. A fixed standing order asks nothing of the person executing it; a recommendation that shifts with every patient — eight units here, five there — turns each departure from habit into a check and a justification. As one nurse anticipated it, having to confirm a different number with the prescribing physician each time was a recipe for confusion on the floor.

The second was the edge case. A model trained on history may not represent the patient who falls outside it: alloimmunisation, rare blood-group combinations, allergic reactions, the sudden intraoperative bleed. One surgeon pressed a specific version of this — whether prior abdominal surgery, which raises tissue adhesion and with it expected blood loss, was weighted heavily enough. The third was opacity. Even after the team demonstrated the system and walked through explainability tools, several participants still described the model as a black box, a phrase the authors note recurred. Trust did not follow automatically from a good demonstration.

What it cannot tell you

It is tempting to read all this as proof that asking the ward in advance makes an implementation safer. The study cannot support that claim, and to its credit does not try to. It shows that a structured pre-deployment inquiry surfaces specific, plausible friction points — not that surfacing them changes how the rollout actually goes, how much blood is ultimately wasted, or whether any patient is better off. Anticipated concerns are a hypothesis about the future, not a measurement of it. Whether the verification burden the nurses feared materialises, and whether it is offset by reduced waste, are questions only a prospective evaluation can settle.

The authors are candid about the rest. The work sits in a single 2,764-bed tertiary hospital in Seoul performing some 70,892 operations a year (2023 figure) — a setting with transfusion infrastructure that smaller institutions lack, so the findings may not travel. The sampled physicians were earlier in their careers than the other groups; focus groups carry recall bias and incomplete reporting; the interview design did not separate short-term from long-term effects; and the SEIPS framework itself, they note, has been criticised as cumbersome. The study was funded by a Korean national health-technology programme and the authors declare no competing interests. Read for what it is — a careful, anticipatory snapshot — it is a good piece of work. Read as a verdict on whether the tool helps, it is silent by design.

“Workflows do not simply absorb a new tool. The tool and the workflow reshape each other — which is why it is worth asking the people inside the workflow first.”

Why it matters here

The structural lesson survives translation. As hospitals across Europe wire machine-learning decision support into their record systems, the failure point is rarely the model's accuracy on a validation set; it is the distance between a tool that performs on a test set and a ward that has to absorb it on a Tuesday afternoon. The cheapest information about that distance is available before anything is switched on, and the people who will live with the system tend to name the real obstacles — the verification load, the edge cases, even the elevator that the blood has to come up in — more precisely than any post-hoc dashboard will. The study does not prove that asking first works. It shows what asking first surfaces.

Source: Park YE, Ock M, Lee JH, et al. Assessing Health Care Professionals' Perceptions of a New System in Clinical Workflows. Journal of Medical Internet Research 2026;28:e86166. A single-centre qualitative study of staff perceptions before deployment — it measures what clinicians anticipated, not patient outcomes or the effect of the tool itself.

#Journal Club#Clinical Decision Support#Implementation Science#Health Workforce#Evidence-Based Medicine

Asking the Ward Before the AI Arrives

The shape of the evidence

Where the staff agreed with the premise

Three worries worth naming

What it cannot tell you

Why it matters here

Keep reading

Why aiomics for QM reports and quality analytics

The 4 p.m. Hazard: When Bad Software Becomes a Clinical Risk

The Value of AI Isn't Prediction. It's Cognitive Ergonomics.

This analysis comes from the people behind Visite.

Want to see this in your hospital?