Journal Club18 June 20264 min read

Wearables and Dementia: A Strong Signal on Thin Validation

Forty-nine studies suggest disturbed sleep and activity shadow cognitive decline by years. Only three tested their model outside the lab that built it. The signal is real; the case that it works as a screening tool is not yet made.

Dr. Sven Jungmann

CEO

Editorial collage of an older person's wrist with a plain band rendered as a teal arc, faint activity waveforms below, and one amber dot marking a single external validation link.

Three. Of the 49 studies in this review, three ever tested their predictive model on data from an institution other than the one that built it. Everything else the paper reports — and there is a lot worth reporting — has to be read in the light of that number, 6.1 percent.

The premise is the kind that makes clinicians lean forward. Sleep, physical activity and circadian rhythm shift measurably in the years before someone fails a cognitive test, and a band on the wrist records all three continuously, passively, for almost nothing. Catch the drift early and the treatment window opens years sooner. Cejudo and colleagues, writing in the Journal of Medical Internet Research, set out to map what the 2020-2025 literature actually establishes about wearables and the early detection of cognitive impairment and dementia. Their framing is exactly right: the question is not whether the signal exists, but whether the evidence has matured enough to act on it.

The shape of the evidence

The search returned 7,175 records and narrowed, after screening, to 49 included studies — a systematic review with structured narrative synthesis under PRISMA 2020 guidance, not a meta-analysis. The authors judged the studies too heterogeneous in devices, endpoints and methods to pool, which reads as an honest call rather than a missing step. The headcount looks reassuring at first: more than 200,000 participants across all studies. The distribution is the real story. Sample sizes ran from 14 to 91,948, and the median study enrolled 145 people. The review was not prospectively registered, which the authors flag.

Where the data are genuinely good

The behavioural signal holds up. Across the studies, disrupted sleep, fragmented circadian rhythm and irregular activity tracked with worse cognition, at effect sizes the authors describe as modest to moderate. The finding that matters most lives in the longitudinal cohorts: disturbed sleep-wake patterns can precede clinically obvious impairment by several years. For a disease whose treatable window opens early and closes quietly, an early passive marker is worth taking seriously.

Some studies went past association toward prediction. Most (28 of 49, 57.1 percent) were oriented to early detection, though only 11 (22.4 percent) reported quantitative results bearing directly on it; the rest contributed indirect evidence. Where machine-learning or deep-learning models were applied, they reported area under the receiver-operating-characteristic curve (AUROC, a measure of how well a model separates cases from non-cases) of roughly 0.70 to 0.95 — on paper, middling to excellent discrimination.

Where the claim outruns the data

Most of the work — 73.5 percent, 36 of 49 — used traditional statistical methods. It showed that a marker is associated with cognition, not that a model can forecast an individual's trajectory from it. Those are different claims, and the AUROC range collapses some of the distance between them: a figure reported only on the data a model was trained on describes its fit, not its behaviour on the next clinic's patients. The review makes this visible in the coefficients themselves. In small studies, standardised effects ran larger (β≈.35-.55); in the large cohorts they shrank toward β≈.10-.25, or odds ratios of 1.3-1.8. Bigger studies, smaller effects — the signature of overfitting, not of a stronger truth.

Two structural facts seal the point. The strongest data come from research-grade actigraphy — laboratory instruments such as the Actiwatch or ActiGraph, used in 43 of 49 studies (87.8 percent). The consumer wearables a screening programme would actually depend on — the Fitbits, the Apple Watches — appeared in 7 (14.3 percent). The best evidence is for the devices most people will never wear. And then the number to hold on to: three models, of 49, were ever validated outside their home institution. A predictive model never tested on external data is, precisely, one whose performance you cannot yet trust beyond its own dataset.

“A strong early signal and weak validation evidence are not a contradiction. Both belong, in full and equal measure, in any serious assessment.”

What to do with it

An estimated 1.8 million people in Germany live with dementia, and the wish for a cheap, passive early-warning marker is entirely reasonable. The review earns its place precisely because it calibrates that wish. The signal is real: behavioural markers do shift before the diagnosis. The infrastructure that would turn the signal into a clinical screening instrument — external validation, consumer-grade devices, prospective follow-up, samples larger than 145 — does not yet exist at the scale the claim demands. That is not a verdict against the field. It is the specification for the studies that have to come next, and a reminder that a digital biomarker earns the word 'screening' only once it has worked on patients it has never seen.

Source: Cejudo A, Arrojo M, Martín C, Almeida A. AI and Wearables for Early Detection of Cognitive Impairment and Dementia: Systematic Review. J Med Internet Res 2026;28:e86262. Funded by the Vicomtech Foundation, Basque Country, Spain; no competing interests declared. This is a peer-reviewed systematic review with narrative synthesis — no pooled estimate and no primary data of its own; it reports the state of a young, mostly associational literature, not a validated screening test.

#Journal Club#Digital Biomarkers#Dementia#Evidence-Based Medicine#Wearables

Wearables and Dementia: A Strong Signal on Thin Validation

The shape of the evidence

Where the data are genuinely good

Where the claim outruns the data

What to do with it

Keep reading

The Best App in the World, and No One on the Ward to Use It

An AUROC of 0.805, Sitting on 97 Percent Heterogeneity

GPT-5 Reads the PET Scan Confidently. It Just Misses the Cancer That Has Spread.

This analysis comes from the people behind Visite.

Want to see this in your hospital?