Skip to main content
Reflections4 min read

The Fluent Hallucination: Why Good Grammar Became a Clinical Risk

When old software failed, it failed loudly — garbled text you could not miss. Generative AI fails in flawless prose. The error no longer announces itself, and a perfectly written note may be the most dangerous thing in the record.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

A physician at a desk in early light reads a clean, confidently typeset clinical note with a pen hovering above one faintly underlined paragraph.

A registrar reads the discharge summary the system has drafted overnight. It is, by any ordinary measure, excellent: clean syntax, the right register, the cadence of someone who knows the field. She is three paragraphs in before something snags — a drug allergy listed with complete confidence that, as far as she can reconstruct, the patient has never had. There was no warning. The sentence that contained the error read exactly like the sentences that did not.

We have grown so used to software failing loudly that we mistook the noise for the problem. When a legacy system broke, it produced garbled text, broken fields, an error you could not read past. The ugliness was, in hindsight, a safety feature. It forced the clinician to stop, wince and fix the thing. The error announced itself.

Generative models do not break that way. They are not built to be right; they are built to be plausible, predicting the next likely word so the sentence keeps its shape. So when one invents a clinical fact, it does not stutter or glitch. It writes the falsehood in the same fluent, professional voice it uses for everything true. It wraps the error in the skin of expertise.

The danger is not that the machine will be wrong. It is that it will be plausibly wrong — and that fluency now buries the very signal we relied on to catch it.

The signal we have lost

When a junior doctor is unsure, you can hear it in the writing. The note goes short and hedged: it appears, possibly, query. The uncertainty shows up in the texture of the text, and a senior reader registers it without being told. For a long time that hesitation was one of medicine's quiet safety mechanisms — a tell that flagged where to look harder.

A model has no such tell. It states a verified finding and an invented one in precisely the same tone, with the same even confidence. The metadata of doubt is gone. Nothing in the surface of the prose distinguishes the part that is grounded from the part that is not, which means the reader can no longer triage by feel.

The audit tax

Here is the uncomfortable arithmetic. We buy these systems to save time, and the saving is real on the writing side. But using them safely means replacing reading — passive, fluent consumption — with auditing: reading every line as a claim to be checked against the source. And auditing is cognitively more expensive than writing.

It is genuinely easier to compose a note from scratch than to read a flawless paragraph and cross-examine it for a subtle semantic error: a date shifted by a year, a laterality flipped, a milligram quietly become a microgram. The fluency that makes the text pleasant to read is exactly what makes the error hard to find. In the short run, done honestly, safe adoption can raise the cognitive load rather than lower it — an audit tax that no one priced into the business case.

The failure mode is predictable. When the audit is hard and the day is long, people stop auditing. They sign off on fluent notes because the prose is reassuring and they are tired, and the signature comes to mean less than everyone assumes it does. A perfectly written paragraph is the easiest thing in the world to wave through.

Reading as interrogation

The skill this asks of clinicians is not prompting the machine more cleverly. It is interrogating what it returns. Read the document as if it were guilty until proven innocent. Set the eloquence aside — it carries no information about accuracy — and go straight for the data points that hallucinate most readily: the dates, the doses, the lateralities, the allergies, the numbers that have to match a source somewhere.

None of this argues against the tools. It argues against a particular daydream: that you can lay AI over the existing routine, keep skimming the way clinicians have always skimmed, and simply collect the time. A strategy that depends on tired people reading fluent text quickly is not an efficiency programme. It is a way of moving the risk somewhere you have stopped looking.

#Reflections#Clinical AI#Patient Safety#Clinical Documentation#Digital Health

Keep reading

A clinician turns away from a screen showing a finished treatment plan to face an older patient who holds a prescription without yet putting it away.
Reflections

The Dr. House Bubble: When the Diagnosis Stops Being the Hard Part

For a century we paid for the answer and tolerated the manner. As machines make the answer cheap, the scarce thing left is the part we always undervalued: getting a frightened human being to actually follow the plan.

Dr. Sven JungmannCEO
A physician sits alone at a ward workstation late in the evening, eyes fixed on the keyboard rather than the screen, with the dark empty ward behind her.
Reflections

The IKEA Effect in Medicine: Why Typing Is Not Caring

We treat a clinical note as thorough only if a doctor typed every word of it. We have confused the pain of writing with the quality of the record — and we keep rewarding the wrong thing.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.