What 4,764 Wound Photos Reveal About Who Can Read Their Own Wound
A Taipei team let patients flag their own wound infections through a chatbot. Those who had watched a chronic wound for months agreed with the surgeon almost every time; those days out of surgery did barely better than chance. Experience, it turns out, is a variable.

Dr. Sven Jungmann
CEO

Start with two numbers from the methods section, because they carry the whole story. Patients with acute surgical wounds in this study submitted, on average, 6.4 photographs each. Patients with chronic wounds submitted 59.2. The chronic group had simply looked at their own wounds, day after day, nine times as often — and that difference in exposure, not anything about the technology, turns out to predict who could tell the surgeon something useful and who could not.
The study is Su, Lin and Huang, published in JMIR Formative Research on 8 January 2026: a retrospective observational analysis at a single tertiary centre in Taipei. Adults with wounds joined a chatbot running on the Line messaging app — the regional equivalent of WhatsApp — and used their own phones to send daily photographs together with a one-line self-report: did they think the wound was infected? A senior plastic surgeon then reviewed each submission independently and decided one thing only — does this patient need a callback, yes or no? Across 4,764 photographs from 159 patients (88 with acute wounds, 71 with chronic), the question was how often the patient's verdict lined up with the surgeon's.
Two caveats that come before any statistic
First, this is Formative Research — early-stage, feasibility-flavoured, retrospective, single-centre, with patients recruited by convenience. It is a careful look at a real signal, not a trial. Second, and more load-bearing: the reference standard was not a confirmed infection. It was one experienced surgeon's decision to call a patient back. Every performance figure below therefore measures agreement between patient and reviewer, not accuracy against a microbiological or clinical truth. The patient and the surgeon can agree and both be wrong, and the authors name the single-reviewer design as a limitation themselves.
The split
Read with that ceiling in mind, the contrast is hard to miss. For chronic wounds, the self-report tracked the surgeon closely: an area under the receiver operating characteristic curve — AUROC, a single summary of agreement across all decision thresholds — of 0.907, with sensitivity of 94.9 percent and specificity of 86.4 percent. When someone who had lived with an ulcer for months said 'infected', the surgeon almost always agreed enough to call them in. For acute wounds, the same self-report fell apart: AUROC 0.702 and sensitivity of just 52.6 percent (95% CI 31.7–72.7). Roughly half the photographs the surgeon judged callback-worthy went unflagged by the very patient who had taken them.
The secondary analysis explains the gap rather than restating it. Among acute patients, only one feature was associated with reporting infection — redness, at a modest odds ratio of 3.94 (95% CI 1.97–7.90). The catch is that redness in the first days after surgery is usually ordinary healing, not infection: the acute group was anchoring on the one sign it could name, and it was the wrong one. Among chronic patients the same signs were far stronger and more specific — redness carried an odds ratio of 86.35 (95% CI 57.11–130.56) and skin darkening 358.55 (95% CI 244.79–525.16). Those are the fingerprints of people applying a learned, internally consistent rule: they have watched what their wound does when it turns, and they report it the same way every time. Nine times the exposure buys that rule.
“The useful finding is not that patients can read wounds. It is that experience is a diagnostic variable — and a system that asks everyone the same question is quietly assuming it away.”
Where the claim has to stop
It does not follow that chronic-wound patients detect infection well in any absolute sense — only that they agree with one surgeon's callback decisions, and that surgeon's judgement is the yardstick, not ground truth. Nor does anything here show the chatbot improves care: no one was followed forward to learn whether it caught infections earlier, prevented admissions, or merely generated more callbacks. Retrospective, single-site, one language, one health system — the precise figures are a reason to test prospectively, not performance you could attach to a deployed product.
Why it travels beyond wounds
The lesson generalises to almost any patient-facing tool that asks people to report symptoms: a post-discharge app, a triage chatbot, a remote-monitoring questionnaire. Each assumes the patient can recognise what it is asking about — and this study shows that assumption fails in a predictable direction, with reliability rising the longer someone has lived with the condition. A self-report from a patient three days out of theatre is not the same instrument as one from a patient fourteen months into an ulcer, and a system that scores them identically will miss exactly the inexperienced patients who most need a human to look. For European post-operative monitoring, where these tools are spreading fastest, that asymmetry is the thing worth designing around.
Source: Su Y-C, Lin Y-H, Huang M-Y. Linking Patient-Reported and Clinician-Assessed Wound Status via Chatbot-Based Digital Surveillance for Wound Infection: Retrospective Observational Study. JMIR Formative Research 2026;10:e77685. An early-stage, single-centre retrospective study with no external funding and no declared conflicts; its reference standard was one surgeon's callback decision, not confirmed infection — a feasibility signal, not a trial.


