A Few Hundred Bad Records: What the Data-Poisoning Paper Actually Claims
An analytical synthesis argues that poisoning a medical AI scales with the absolute number of tampered records, not their share of the dataset. The reasoning is sound and worth knowing. But it is a threat model, not a measured event — and that distinction is the point.

Dr. Sven Jungmann
CEO

Most of us carry a quiet assumption about large training sets: that size is a form of safety. A model fed ten million records, the intuition goes, must be harder to corrupt than one fed ten thousand — there is simply more good data to drown out the bad. A new analytical paper in the Journal of Medical Internet Research argues that this intuition is, for an attacker, almost irrelevant. What matters is not the share of the data that has been tampered with. It is the raw count. And the count that can do real damage is startlingly small.
The figure the authors keep returning to is 100 to 500 manipulated records. Synthesising prior security research, they report that an attacker who can slip that many poisoned samples into the data behind a medical AI can compromise it, with success rates above 60 percent in general and, for medical imaging models, between 70 and 95 percent. Crucially, that range barely shifts whether the surrounding dataset holds ten thousand records or ten million. A larger corpus is not a thicker wall.
First, what kind of paper this is
The genre matters here, because it sets the ceiling on how much any single sentence can bear. This is not an experiment. Farhad Abtahi and colleagues at the Karolinska Institutet, with partners at the Universidad Politécnica de Madrid, attacked no clinical system. They reviewed 41 security studies published between 2019 and 2025 and assembled an analytical threat-modelling framework, illustrated with eight worked scenarios spanning imaging models, clinical language models, scheduling agents, federated learning, and organ-allocation systems. The empirical numbers are inherited from that prior literature — much of it on non-medical models — and the medical scenarios are constructed projections. The authors are unambiguous about this, in the text and in their tables.
The claim that travels
The absolute-number argument is the part worth carrying away, and it is mechanistically plausible. Across the architectures reviewed — convolutional networks for imaging, large language models, reinforcement-learning agents — the cited studies report that attack success tracks the number of poisoned examples, not their proportion. The figures run from 100 to 500 samples for imaging models down to as few as 100 examples to undo a language model's safety alignment. The reasoning is not exotic: a learning system can acquire a narrow, reliable association from a small but internally consistent set of examples, and a sea of benign data does not dilute a signal that is coherent in itself.
The paper's more original move is about visibility rather than feasibility. Privacy law — the General Data Protection Regulation in Europe, its US analogues — exists precisely to prevent the cross-institutional pooling of patient records. But that same barrier is what would let an auditor spot a subtle, distributed manipulation in the first place. The rules that protect patients can, as a side effect, blind the people who might otherwise catch a slow campaign. The authors put the resulting detection delay at six to twelve months, longer in federated or privacy-constrained settings. That number is a reasoned projection, not a measurement — but the tension it names, between auditability and confidentiality, is genuine and carefully argued.
Where the evidence stops
This is where a careful reader has to honour the line the authors themselves draw. The vivid cases are the ones that stay with you: a radiology model quietly missing cancers in one demographic after roughly 250 tampered images — 0.025 percent of a million-image set — or an organ-allocation model drifting into systematic bias over years before the harm becomes statistically legible. Both are labelled, in the paper's own tables, as threat-modelling projections, not documented incidents. They exist to make a mechanism visible, and they do that job well. They are not evidence that such an attack has occurred, nor an estimate of how likely one is. A threat that is feasible in principle and a threat that is happening are different objects.
“A threat that is feasible in principle and a threat that is occurring are different objects — and this paper is scrupulous about not confusing the two.”
The authors' own limitations are the honest ones. They ran no original attacks on production systems. Their literature was English-language only, with the selection bias that implies. The empirical studies they lean on examined models of up to 13 billion parameters, while clinical foundation models now reach 100 billion and beyond — so the extrapolation to the largest models, they concede, still needs empirical confirmation. And the defences they sketch — monitoring for disagreement across model ensembles, adversarial testing, logging that is auditable yet privacy-preserving — have not been validated in any prospective clinical setting. Cite this as a well-reasoned argument about a plausible risk, not as a measured rate of harm.
Why it matters here
For European systems the relevant detail is concrete. The European Health Data Space is built to connect health data across 27 member states — on the order of 450 million people — with federated learning among its intended mechanisms. A federated model trained across that space inherits the trust assumptions of every contributing node. In the authors' projection, control over the data feeds of three to five member states — eleven to nineteen percent of participants — could in principle shape a shared model while leaving each national dataset looking unremarkable. That is a design question worth raising while the architecture is still open, not a cause for alarm. The usable takeaway is narrow: when a clinical AI is evaluated, robustness against deliberately corrupted training data belongs on the list of questions — and the size of the dataset is not an answer to it.
Source: Abtahi F, Seoane F, Pau I, Vega-Barbas M. Data Poisoning Vulnerabilities Across Health Care Artificial Intelligence Architectures: Analytical Security Framework and Defense Strategies. Journal of Medical Internet Research 2026;28:e87969. An analytical threat-modelling synthesis of prior security research, funded by the SMAILE core facility at the Karolinska Institutet (no competing interests declared); it presents no original experiments, and its medical attack scenarios are explicitly hypothetical rather than documented incidents.


