The Fluent Hallucination: Why Good Grammar Became a Clinical Risk
When old software failed, it failed loudly — garbled text you could not miss. Generative AI fails in flawless prose. The error no longer announces itself, and a perfectly written note may be the most dangerous thing in the record.

Dr. Sven Jungmann
CEO

A registrar reads the discharge summary the system has drafted overnight. It is, by any ordinary measure, excellent: clean syntax, the right register, the cadence of someone who knows the field. She is three paragraphs in before something snags — a drug allergy listed with complete confidence that, as far as she can reconstruct, the patient has never had. There was no warning. The sentence that contained the error read exactly like the sentences that did not.
We have grown so used to software failing loudly that we mistook the noise for the problem. When a legacy system broke, it produced garbled text, broken fields, an error you could not read past. The ugliness was, in hindsight, a safety feature. It forced the clinician to stop, wince and fix the thing. The error announced itself.
Generative models do not break that way. They are not built to be right; they are built to be plausible, predicting the next likely word so the sentence keeps its shape. So when one invents a clinical fact, it does not stutter or glitch. It writes the falsehood in the same fluent, professional voice it uses for everything true. It wraps the error in the skin of expertise.
“The danger is not that the machine will be wrong. It is that it will be plausibly wrong — and that fluency now buries the very signal we relied on to catch it.”
The signal we have lost
When a junior doctor is unsure, you can hear it in the writing. The note goes short and hedged: it appears, possibly, query. The uncertainty shows up in the texture of the text, and a senior reader registers it without being told. For a long time that hesitation was one of medicine's quiet safety mechanisms — a tell that flagged where to look harder.
A model has no such tell. It states a verified finding and an invented one in precisely the same tone, with the same even confidence. The metadata of doubt is gone. Nothing in the surface of the prose distinguishes the part that is grounded from the part that is not, which means the reader can no longer triage by feel.
The audit tax
Here is the uncomfortable arithmetic. We buy these systems to save time, and the saving is real on the writing side. But using them safely means replacing reading — passive, fluent consumption — with auditing: reading every line as a claim to be checked against the source. And auditing is cognitively more expensive than writing.
It is genuinely easier to compose a note from scratch than to read a flawless paragraph and cross-examine it for a subtle semantic error: a date shifted by a year, a laterality flipped, a milligram quietly become a microgram. The fluency that makes the text pleasant to read is exactly what makes the error hard to find. In the short run, done honestly, safe adoption can raise the cognitive load rather than lower it — an audit tax that no one priced into the business case.
The failure mode is predictable. When the audit is hard and the day is long, people stop auditing. They sign off on fluent notes because the prose is reassuring and they are tired, and the signature comes to mean less than everyone assumes it does. A perfectly written paragraph is the easiest thing in the world to wave through.
Reading as interrogation
The skill this asks of clinicians is not prompting the machine more cleverly. It is interrogating what it returns. Read the document as if it were guilty until proven innocent. Set the eloquence aside — it carries no information about accuracy — and go straight for the data points that hallucinate most readily: the dates, the doses, the lateralities, the allergies, the numbers that have to match a source somewhere.
None of this argues against the tools. It argues against a particular daydream: that you can lay AI over the existing routine, keep skimming the way clinicians have always skimmed, and simply collect the time. A strategy that depends on tired people reading fluent text quickly is not an efficiency programme. It is a way of moving the risk somewhere you have stopped looking.


