Skip to main content
Journal Club4 min read

What 4,764 Wound Photos Reveal About Who Can Read Their Own Wound

A Taipei team let patients flag their own wound infections through a chatbot. Those who had watched a chronic wound for months agreed with the surgeon almost every time; those days out of surgery did barely better than chance. Experience, it turns out, is a variable.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of two hands holding phones showing the same wound photo at different scales, with a teal grid behind suggesting many accumulated images and a single amber accent.

Start with two numbers from the methods section, because they carry the whole story. Patients with acute surgical wounds in this study submitted, on average, 6.4 photographs each. Patients with chronic wounds submitted 59.2. The chronic group had simply looked at their own wounds, day after day, nine times as often — and that difference in exposure, not anything about the technology, turns out to predict who could tell the surgeon something useful and who could not.

The study is Su, Lin and Huang, published in JMIR Formative Research on 8 January 2026: a retrospective observational analysis at a single tertiary centre in Taipei. Adults with wounds joined a chatbot running on the Line messaging app — the regional equivalent of WhatsApp — and used their own phones to send daily photographs together with a one-line self-report: did they think the wound was infected? A senior plastic surgeon then reviewed each submission independently and decided one thing only — does this patient need a callback, yes or no? Across 4,764 photographs from 159 patients (88 with acute wounds, 71 with chronic), the question was how often the patient's verdict lined up with the surgeon's.

Two caveats that come before any statistic

First, this is Formative Research — early-stage, feasibility-flavoured, retrospective, single-centre, with patients recruited by convenience. It is a careful look at a real signal, not a trial. Second, and more load-bearing: the reference standard was not a confirmed infection. It was one experienced surgeon's decision to call a patient back. Every performance figure below therefore measures agreement between patient and reviewer, not accuracy against a microbiological or clinical truth. The patient and the surgeon can agree and both be wrong, and the authors name the single-reviewer design as a limitation themselves.

The split

Read with that ceiling in mind, the contrast is hard to miss. For chronic wounds, the self-report tracked the surgeon closely: an area under the receiver operating characteristic curve — AUROC, a single summary of agreement across all decision thresholds — of 0.907, with sensitivity of 94.9 percent and specificity of 86.4 percent. When someone who had lived with an ulcer for months said 'infected', the surgeon almost always agreed enough to call them in. For acute wounds, the same self-report fell apart: AUROC 0.702 and sensitivity of just 52.6 percent (95% CI 31.7–72.7). Roughly half the photographs the surgeon judged callback-worthy went unflagged by the very patient who had taken them.

The secondary analysis explains the gap rather than restating it. Among acute patients, only one feature was associated with reporting infection — redness, at a modest odds ratio of 3.94 (95% CI 1.97–7.90). The catch is that redness in the first days after surgery is usually ordinary healing, not infection: the acute group was anchoring on the one sign it could name, and it was the wrong one. Among chronic patients the same signs were far stronger and more specific — redness carried an odds ratio of 86.35 (95% CI 57.11–130.56) and skin darkening 358.55 (95% CI 244.79–525.16). Those are the fingerprints of people applying a learned, internally consistent rule: they have watched what their wound does when it turns, and they report it the same way every time. Nine times the exposure buys that rule.

The useful finding is not that patients can read wounds. It is that experience is a diagnostic variable — and a system that asks everyone the same question is quietly assuming it away.

Where the claim has to stop

It does not follow that chronic-wound patients detect infection well in any absolute sense — only that they agree with one surgeon's callback decisions, and that surgeon's judgement is the yardstick, not ground truth. Nor does anything here show the chatbot improves care: no one was followed forward to learn whether it caught infections earlier, prevented admissions, or merely generated more callbacks. Retrospective, single-site, one language, one health system — the precise figures are a reason to test prospectively, not performance you could attach to a deployed product.

Why it travels beyond wounds

The lesson generalises to almost any patient-facing tool that asks people to report symptoms: a post-discharge app, a triage chatbot, a remote-monitoring questionnaire. Each assumes the patient can recognise what it is asking about — and this study shows that assumption fails in a predictable direction, with reliability rising the longer someone has lived with the condition. A self-report from a patient three days out of theatre is not the same instrument as one from a patient fourteen months into an ulcer, and a system that scores them identically will miss exactly the inexperienced patients who most need a human to look. For European post-operative monitoring, where these tools are spreading fastest, that asymmetry is the thing worth designing around.

Source: Su Y-C, Lin Y-H, Huang M-Y. Linking Patient-Reported and Clinician-Assessed Wound Status via Chatbot-Based Digital Surveillance for Wound Infection: Retrospective Observational Study. JMIR Formative Research 2026;10:e77685. An early-stage, single-centre retrospective study with no external funding and no declared conflicts; its reference standard was one surgeon's callback decision, not confirmed infection — a feasibility signal, not a trial.

#Journal Club#Digital Health#Patient-Reported Outcomes#Wound Care#Evidence-Based Medicine

Keep reading

Editorial collage of a confident stack of clinical document fragments bound by a teal bracket that stops at a closed ward door, with a single amber accent.
Journal Club

Sixty-Five Studies Agree the Models Win. The Ward Hasn't Noticed.

A PRISMA review of 65 studies finds language models consistently beat classical methods at classifying clinical text. The honest reading is narrower: it is a synthesis of single-site accuracy studies that mostly never asked whether the models work at the bedside.

Dr. Sven JungmannCEO
Editorial collage of a clinical summary sheet torn down the middle, one half framed by a teal speech bubble and the other by a navy clipboard, with a single amber dot on the tear line.
Journal Club

Two Readers, One Summary: Who Should Grade Patient-Facing AI?

A small Stanford study had clinicians and parents rate the same AI-written clinical summaries. They disagreed, significantly — and that disagreement, not the scores, is the finding worth keeping.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.