Skip to main content
Journal Club5 min read

A Smart Ring, a Questionnaire, One Night: Reading a Preoperative Sleep-Risk Model

A Chinese single-centre study built a model that flags older surgical patients likely to sleep badly the night before an operation. The discrimination is high — but it was validated on itself, and what it predicts is a label, not a recovery.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of an older patient's hand wearing a smart ring on a hospital blanket, framed by a teal circle, with a halftone sleep-curve band and a single amber accent.

An area under the curve of 0.92 is a number that travels well. It looks like a verdict; it gets quoted as one. A recent study in JMIR Formative Research reports exactly that figure for a model meant to flag, the evening before surgery, which older patients are heading for a bad night's sleep. Before anyone repeats the 0.92, it is worth slowing down on two questions the number alone cannot answer: what is the model actually predicting, and how was it tested.

Start with the journal's own name. Formative research is early-stage work — feasibility, model development, first signals — and the label is honest rather than apologetic. Read as that, this is a careful study. Read as a tool you could put on a ward tomorrow, it is unfinished, and the authors do not pretend otherwise.

The setup, and the base rate

In a single hospital in China, 242 patients aged 60 to 80 awaiting elective surgery each wore a SRing2 smart ring the night before their operation. The ring logged total sleep time, rapid-eye-movement (REM) and light-sleep duration, and the number of times they woke. Alongside it the team ran a familiar panel of instruments: the Pittsburgh Sleep Quality Index (PSQI), the Hospital Anxiety and Depression Scale (HADS), the Mini-Mental State Examination, a pain rating and routine bloods. The work is reported under the TRIPOD guideline for prediction models — a prospective cohort, not a randomized trial, which already tells you what kind of claim it can and cannot make.

Patients who scored PSQI 7 or higher and slept under 6.5 hours were labelled as having a sleep disorder. Forty did — about one in six, a 16.5% base rate. Hold that figure: it is the quiet denominator against which the headline 0.92 has to be read.

What the data genuinely show

Logistic regression left four predictors standing. A higher anxiety-and-depression score more than tripled the odds of the sleep-disorder label (odds ratio 3.21, 95% confidence interval 1.54–6.69; P = .002), and each extra awakening recorded by the ring did much the same (OR 3.33, 1.82–6.12; P < .001). More REM and more light sleep were modestly protective, a few percent per minute each (OR 0.96 and 0.98). The model separated the two groups well — an area under the receiver-operating-characteristic curve (AUROC, in plain terms how cleanly it sorts the 40 from the 202) of 0.92 — with calibration close to the ideal line and a decision-curve analysis suggesting net benefit across thresholds from roughly 0.2 to 0.8. As evidence that consumer sleep data plus a short anxiety scale carry real predictive signal, this is sound.

What is actually being predicted

Here is the limit that matters most, and it is not about statistics. The outcome the model predicts is not delirium, not length of stay, not recovery. It is a label — a PSQI score crossed with a sleep-duration cutoff — measured on the same night as the predictors that are supposed to forecast it. Two of those four predictors, the awakenings and the light-sleep duration, are themselves constituents of poor sleep. The model is, in part, predicting a thing from its own ingredients. That a high HADS score keeps company with a rough night is clinically believable; what the study cannot tell us is whether flagging that patient, and acting on the flag, changes anything after the incision. Nobody was followed past the operating-room door.

The model identifies a risk label measured the same night; whether acting on that flag improves any outcome after surgery is a question this study does not, and cannot, answer.

And how well it was tested

The 0.92 was earned inside the dataset that produced it. Validation was internal: 1,000 bootstrap resamples of the same 242 patients. Bootstrapping is worth doing — it curbs one flavour of optimism — but it resamples the people you already have; it is not a second hospital, a second ring, a second population. With only 40 events to learn from, a four-variable model is also at the edge of what the data can honestly support, and the rule of thumb holds with discouraging reliability: out-of-sample performance almost always falls below the in-sample figure. The 0.92 is a ceiling, not a forecast.

Why it matters here

The European appeal is easy to see. A consumer wearable and a five-minute scale cost almost nothing, and preoperative anxiety and sleep are genuinely undertreated. The authors stay within their evidence: they ask for external validation and longitudinal follow-up, and they point toward low-tech responses — sleep hygiene, family presence, cognitive-behavioural techniques — rather than a black box on the ward. That restraint is the right posture for a formative study. A model like this would earn clinical trust through three things it does not yet have: validation in an independent cohort, a tie to an outcome that actually reaches the patient, and the regulatory framing that any software steering a clinical decision needs under the Medizinprodukteverordnung (MDR). Until then it is a promising signal from one night in one hospital — which is precisely what it set out to be.

Source: Li J, Yang B, Gao P, et al. Predictive Modeling of Preoperative Sleep Disorder Risk in Older Adults by Using Data From Wearable Monitoring Devices: Prospective Cohort Study. JMIR Formative Research 2026;10:e79008. A single-centre prospective cohort and model-development study with internal validation only, predicting a questionnaire-defined sleep label rather than any postoperative outcome; reported by the authors with no conflict of interest declared.

#Journal Club#Wearables#Prediction Models#Evidence-Based Medicine#Perioperative Care

Keep reading

Editorial collage of a confident stack of clinical document fragments bound by a teal bracket that stops at a closed ward door, with a single amber accent.
Journal Club

Sixty-Five Studies Agree the Models Win. The Ward Hasn't Noticed.

A PRISMA review of 65 studies finds language models consistently beat classical methods at classifying clinical text. The honest reading is narrower: it is a synthesis of single-site accuracy studies that mostly never asked whether the models work at the bedside.

Dr. Sven JungmannCEO
Editorial collage of a clinical summary sheet torn down the middle, one half framed by a teal speech bubble and the other by a navy clipboard, with a single amber dot on the tear line.
Journal Club

Two Readers, One Summary: Who Should Grade Patient-Facing AI?

A small Stanford study had clinicians and parents rate the same AI-written clinical summaries. They disagreed, significantly — and that disagreement, not the scores, is the finding worth keeping.

Dr. Sven JungmannCEO
Editorial collage of a clinician's still hands on a keyboard beneath a teal performance line drifting downward off a navy block, with a single amber accent marking the unnoticed dip.
Journal Club

The Governance Gap: Why Clinical AI Fails After It Passes Validation

A clinical model clears validation, goes live, and slowly drifts — and no one is assigned to watch. A narrative review maps why oversight, not algorithms, is now the binding constraint on healthcare AI. Read for what a review can and cannot prove.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.