A Smart Ring, a Questionnaire, One Night: Reading a Preoperative Sleep-Risk Model
A Chinese single-centre study built a model that flags older surgical patients likely to sleep badly the night before an operation. The discrimination is high — but it was validated on itself, and what it predicts is a label, not a recovery.

Dr. Sven Jungmann
CEO

An area under the curve of 0.92 is a number that travels well. It looks like a verdict; it gets quoted as one. A recent study in JMIR Formative Research reports exactly that figure for a model meant to flag, the evening before surgery, which older patients are heading for a bad night's sleep. Before anyone repeats the 0.92, it is worth slowing down on two questions the number alone cannot answer: what is the model actually predicting, and how was it tested.
Start with the journal's own name. Formative research is early-stage work — feasibility, model development, first signals — and the label is honest rather than apologetic. Read as that, this is a careful study. Read as a tool you could put on a ward tomorrow, it is unfinished, and the authors do not pretend otherwise.
The setup, and the base rate
In a single hospital in China, 242 patients aged 60 to 80 awaiting elective surgery each wore a SRing2 smart ring the night before their operation. The ring logged total sleep time, rapid-eye-movement (REM) and light-sleep duration, and the number of times they woke. Alongside it the team ran a familiar panel of instruments: the Pittsburgh Sleep Quality Index (PSQI), the Hospital Anxiety and Depression Scale (HADS), the Mini-Mental State Examination, a pain rating and routine bloods. The work is reported under the TRIPOD guideline for prediction models — a prospective cohort, not a randomized trial, which already tells you what kind of claim it can and cannot make.
Patients who scored PSQI 7 or higher and slept under 6.5 hours were labelled as having a sleep disorder. Forty did — about one in six, a 16.5% base rate. Hold that figure: it is the quiet denominator against which the headline 0.92 has to be read.
What the data genuinely show
Logistic regression left four predictors standing. A higher anxiety-and-depression score more than tripled the odds of the sleep-disorder label (odds ratio 3.21, 95% confidence interval 1.54–6.69; P = .002), and each extra awakening recorded by the ring did much the same (OR 3.33, 1.82–6.12; P < .001). More REM and more light sleep were modestly protective, a few percent per minute each (OR 0.96 and 0.98). The model separated the two groups well — an area under the receiver-operating-characteristic curve (AUROC, in plain terms how cleanly it sorts the 40 from the 202) of 0.92 — with calibration close to the ideal line and a decision-curve analysis suggesting net benefit across thresholds from roughly 0.2 to 0.8. As evidence that consumer sleep data plus a short anxiety scale carry real predictive signal, this is sound.
What is actually being predicted
Here is the limit that matters most, and it is not about statistics. The outcome the model predicts is not delirium, not length of stay, not recovery. It is a label — a PSQI score crossed with a sleep-duration cutoff — measured on the same night as the predictors that are supposed to forecast it. Two of those four predictors, the awakenings and the light-sleep duration, are themselves constituents of poor sleep. The model is, in part, predicting a thing from its own ingredients. That a high HADS score keeps company with a rough night is clinically believable; what the study cannot tell us is whether flagging that patient, and acting on the flag, changes anything after the incision. Nobody was followed past the operating-room door.
“The model identifies a risk label measured the same night; whether acting on that flag improves any outcome after surgery is a question this study does not, and cannot, answer.”
And how well it was tested
The 0.92 was earned inside the dataset that produced it. Validation was internal: 1,000 bootstrap resamples of the same 242 patients. Bootstrapping is worth doing — it curbs one flavour of optimism — but it resamples the people you already have; it is not a second hospital, a second ring, a second population. With only 40 events to learn from, a four-variable model is also at the edge of what the data can honestly support, and the rule of thumb holds with discouraging reliability: out-of-sample performance almost always falls below the in-sample figure. The 0.92 is a ceiling, not a forecast.
Why it matters here
The European appeal is easy to see. A consumer wearable and a five-minute scale cost almost nothing, and preoperative anxiety and sleep are genuinely undertreated. The authors stay within their evidence: they ask for external validation and longitudinal follow-up, and they point toward low-tech responses — sleep hygiene, family presence, cognitive-behavioural techniques — rather than a black box on the ward. That restraint is the right posture for a formative study. A model like this would earn clinical trust through three things it does not yet have: validation in an independent cohort, a tie to an outcome that actually reaches the patient, and the regulatory framing that any software steering a clinical decision needs under the Medizinprodukteverordnung (MDR). Until then it is a promising signal from one night in one hospital — which is precisely what it set out to be.
Source: Li J, Yang B, Gao P, et al. Predictive Modeling of Preoperative Sleep Disorder Risk in Older Adults by Using Data From Wearable Monitoring Devices: Prospective Cohort Study. JMIR Formative Research 2026;10:e79008. A single-centre prospective cohort and model-development study with internal validation only, predicting a questionnaire-defined sleep label rather than any postoperative outcome; reported by the authors with no conflict of interest declared.


