One Week Earlier: What an AI Wound Index Actually Beats
An AI healing index flags a stalling wound a week before the standard ruler does. The lead time is real and modest — and the study was written by the company that sells the index.

Dr. Sven Jungmann
CEO

173,816 wounds, 85,599 patients, an average age of 76, and the question they all hang on is small: can you tell a week sooner that a wound is going to stall? In a retrospective study in BMJ Digital Health & AI, the answer is a guarded yes — and the guarding is the interesting part. For a pressure injury on a frail patient in a nursing home, a week of earlier escalation is not trivial; it can sit between a dressing change and a debridement. So the finding is worth taking seriously, and worth reading slowly.
The incumbent and its limits
Chronic wounds are a quietly enormous cost — over 126 billion dollars a year to United States medical care providers, by the authors' figure, more than 28 billion of it to Medicare alone. The bedside measure clinicians lean on is percent area reduction (PAR): if a wound has not shrunk by roughly a fifth to a third over four weeks, it is treated as a slow or non-healer. PAR is reproducible, familiar, and accepted by the US Food and Drug Administration as the proxy endpoint in clinical trials of wound products. It is also a single number — area over time — and it ignores most of what a wound nurse actually sees: the tissue in the bed, the exudate, the wound edge, where on the body the wound sits.
What was measured
This is an accuracy comparison on historical data, not a trial. From a digital wound-care platform, the authors pulled 173,816 wounds across 2,316 skilled-nursing facilities and 132 home-health agencies, all from post-acute care. The mix is what you would expect there: pressure injuries dominate at 70.8 percent, then venous (13.5), diabetic foot (9.6) and arterial ulcers (6.3). Against PAR they set the Healing Index — specifically HI Model 5, which fits seven interpretable, mostly image-derived variables (area, tissue composition, exudate, wound edge, anatomical location, care setting) to a time-varying survival model. The label being predicted is whether a wound healed within twelve weeks, as recorded in that same dataset.
The real result
By week three the Healing Index reaches a balanced accuracy of 0.658 (95% CI 0.650–0.665); PAR sits at 0.601 and does not climb to the 65-percent line until week four. That is the headline, stated honestly: one assessment cycle of lead time. Balanced accuracy is the right yardstick here, because the data are lopsided — delayed-healers outnumber timely healers by about four to one — and it stops a model from looking clever simply by always betting on the common outcome. The improvement held across all four wound types, and a repeated-measures analysis of variance confirmed the model mattered (F=35.32, p<0.001), with HI Model 5 the strongest of six variants. None of this is inflated in the writing.
Where the claim runs ahead of the data
Begin with the number. A balanced accuracy near 0.66 means the model is wrong about a third of the time at the very point it earns its advantage. This is an earlier signal, not a confident one — it should send a clinician to look, not to decide. And it predicts a recorded label, nothing more: no one was followed forward to show that acting on the week-three flag produced fewer amputations, fewer admissions, or faster closure. The authors are admirably plain about this; they state outright that the study did not evaluate clinical outcomes, utilisation or cost, and that earlier risk stratification improving those endpoints is a hypothesis for future prospective work. Earlier knowledge becomes better care only if the extra week is used — and a retrospective accuracy analysis cannot show that it was.
Then the fact that colours everything else. Five of the eight authors are current employees of Swift Medical, the company that develops the Healing Index; a sixth is a former employee; and the entire dataset comes from Swift's own platform. The paper is candid about this — the competing-interests statement is explicit — but a favourable comparison of your own product against the incumbent, on data you collected and scored against a label your own segmentation tools helped define, carries its weight only once someone with no stake reproduces it on data they did not curate. The careful reader neither dismisses vendor research nor banks it; they hold it to the standard of independent replication before calling it settled.
“A week of earlier warning is worth having. It is not the same as a week of better outcomes — and only an independent dataset can tell the two apart.”
What it means for a German ward
The population that drives the result — older, frail, predominantly pressure injuries in post-acute care — maps closely onto German geriatric rehabilitation and Pflege, where pressure-injury prevention is a tracked quality indicator. A tool that surfaces a stalling wound a cycle earlier is the right idea for that setting. But the data are North American, the platform is commercial, the model was trained only on post-acute settings (the authors flag that its thresholds would likely need recalibration elsewhere), and there is no European validation. Before such an index touched a ward here it would need evaluation on local data, the regulatory treatment any clinical software demands under the Medizinprodukteverordnung (MDR), and an answer to the question this study deliberately leaves open: does acting on the earlier flag change what happens to the wound?
Source: Goldstone L, Mohammed HT, Gupta R, et al. Predicting wound healing outcomes: a comparative accuracy analysis of AI-driven indices and percent area reduction. BMJ Digital Health & AI 2026;2(1):e000069. A retrospective accuracy study on a single commercial dataset, with five of eight authors employed by the developer of the index under test; it reports earlier prediction, not improved patient outcomes.


