Skip to main content
Journal Club4 min read

Twenty Thousand Users, No Control Arm: Reading a Real-World Cohort Honestly

The largest real-world cohort yet for a consumer blood-and-wearable platform reports that most users with poor baseline markers improved. The numbers are real. With no comparison group, what they cannot tell us is whether the platform is why.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of a forearm at a blood draw framed by a teal circle holding a single rising line with no comparison line beside it, and one amber accent.

One number in this paper does more work than the headline ones. Among users who began with poor LDL cholesterol, only 20.4 percent improved — while three in four improved on HbA1c, fasting glucose and triglycerides. LDL responds weakly to diet and exercise; the glucose markers respond strongly. So the marker that should barely move barely moved, and the ones that should respond did. That internal consistency is the most persuasive thing in the dataset, and it is worth holding onto, because almost everything else here comes with a caveat the authors are admirably open about.

Who paid, and why it sets the bar

Read the disclosures before the results. All six authors are employees of InsideTracker, the company whose platform was studied; the study was funded by InsideTracker; and the authors hold stock options in it. They put it plainly themselves — there is, in their words, no distinction between the investigative team and the funding source. This does not void the data. A company is often the only party holding enough of its own users' records to run an analysis like this at all. But it fixes the level of scrutiny a careful reader owes the design, which is why the design comes first.

The design, stated plainly

This is a retrospective, observational, longitudinal cohort study in PLOS Digital Health. The platform pairs 39 blood biomarkers — from LDL and HbA1c to vitamin D and cortisol — with fitness-tracker data and polygenic risk scores, and returns lifestyle recommendations. The authors pulled every user with at least two blood tests taken at least 90 days apart (n=20,342), measured how each marker moved from baseline to follow-up a median of 260 days later, and reported the share who improved. A subset tested five or more times, averaging 4.2 years of records, let them ask whether early gains lasted. Crucially, nobody was followed without the platform. There is no comparator.

What the data genuinely show

Among users starting in a suboptimal range, most moved the right way on the markers that diet and exercise actually shift: HbA1c improved in 79.3 percent, triglycerides in 76.2, fasting glucose in 74.2. LDL, as noted, in 20.4. And in the long-followed subset the early improvements largely held over several years rather than rebounding. A sustained shift is harder to dismiss than a one-off follow-up dip, and the biological gradient between responsive and unresponsive markers is the kind of signal you would expect if behaviour, not noise, were driving the change.

What they cannot show

What none of this establishes is that the platform caused the improvement — and the authors say as much, calling the work exploratory and its conclusions hypothesis-generating and correlative. Two mechanisms sit under that caveat. The first is regression to the mean: select people because a value was abnormal, and repeat testing alone drags the group average back toward normal, no intervention required. The authors meet this head-on, arguing that years-long stability is hard to square with regression as the whole story. It is a fair point, not a proof; a motivated, self-monitoring cohort can hold better numbers for reasons that have nothing to do with the recommendations.

The second is the cohort itself. It is 64.2 percent male, 84.3 percent white, US-based, and self-selected by people with, in the authors' phrasing, the means to buy the product and the access to use it. That is nearly the inverse of a representative population, and it is precisely the group most likely to improve unaided. Without a comparison arm of similar people not on the platform, the counterfactual — what would have happened anyway — is simply unavailable. A high improvement rate fits a platform that works; it fits just as well a platform that attracts people already on their way up.

A high proportion improving is compatible with a platform that works, and equally compatible with one that attracts people who were going to improve anyway.

Why it matters here

Integrating blood markers, wearables and genetic risk into personalised prevention is exactly the model behind Europe's debates over digital prevention and reimbursable health applications, where the evidence bar for a claimed clinical benefit is the entire argument. This paper is a clean illustration of that bar. It is a competently reported, unusually large real-world cohort, honest about its limits — and it is still not the study that could justify a prevention claim, because its design cannot separate the intervention from the people who chose it. The lesson is not that real-world data is worthless. It is that real-world data without a comparator answers a narrower question than the headline percentage implies.

Source: Schneider N, Fabian P, Cawley M, Nogal B, Blander G, Deehan R. Improvements in blood and fitness tracker biomarkers in a longitudinal real-world cohort of digital health platform users. PLOS Digital Health 2026;5(3):e0001271. A retrospective, observational cohort with no control arm, funded by and authored entirely by employees of the company whose platform was studied; its findings are, in the authors' own words, hypothesis-generating.

#Journal Club#Digital Health#Evidence-Based Medicine#Preventive Medicine#Real-World Evidence

Keep reading

Editorial collage of a clinician's gloved hands lifting a dressing, with a teal calendar grid below and a single amber dot marking one week, on warm stone paper.
Journal Club

One Week Earlier: What an AI Wound Index Actually Beats

An AI healing index flags a stalling wound a week before the standard ruler does. The lead time is real and modest — and the study was written by the company that sells the index.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.