Skip to main content
Reflections4 min read

The Confidence Trap: Why a Fluent Answer Is Not a Verified One

An AI that reads a scan and an AI that summarises a history are not the same kind of machine, and they do not earn our trust the same way. One we validate against trials. The other we must be able to check, line by line, against its source.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

A physician reads a clean, confident summary on a workstation screen while a thick stack of unopened source documents sits beside the keyboard in shadow.

Late on a quiet shift, a physician opens a generated summary of a patient she has never met. It is clean, well ordered, fluent. It states, among other things, that the patient has a history of non-compliance with medication. The sentence reads with complete authority. There is nothing in its tone to suggest she should check it, and no obvious way to do so without reopening the entire file. So she reads on, and the assertion quietly becomes part of how she sees the patient.

In the rush to adopt AI in medicine, we have folded two very different machines into one word. They do not earn trust the same way, and treating them as if they did is how a fluent sentence turns into a clinical fact that no one ever verified.

Two machines, one word

The first kind is pattern recognition: the model that reads a CT scan or an ECG. It is a black box in the honest sense — it cannot explain its reasoning, and we do not ask it to. We accept it because it has been validated against large, diverse datasets, its sensitivity and specificity demonstrated in peer-reviewed trials. We trust the statistics, not the explanation. The evidence is the trial.

The second kind is generative reasoning: the model that summarises notes, drafts a discharge letter, suggests a protocol. It is a probabilistic engine. It predicts the next plausible word, and it is extraordinarily good at sounding certain whether or not it is right. Fluency is the one thing it can always deliver. Correctness is not.

The trap is to extend the acceptance we learned in radiology to the prose in the record. We grew comfortable trusting a black box that reads images, and we carry that comfort over to a black box that makes claims about a person's history. But a confident sentence about a patient is not a statistical readout. It is an assertion about reality, and an assertion is only worth what its source is worth.

A claim detached from its source is a rumour. A clinical fact is a claim tethered to evidence. The only difference an AI can make is whether it hands you the tether.

Validation for the eye, verification for the reasoning

This points to two standards, not one. For perception — the eye that reads a scan or a signal — we ask for validation. We can tolerate the black box if its performance is proven on populations like ours. For reasoning — the system that makes a claim about history, policy, or fact — we ask for verification. It must be able to point to the document where it found what it says: the specific nursing note, the lab value, the line in the guideline.

The reason is practical before it is philosophical. If a system asserts that a patient is non-compliant but cannot show where it read that, the physician cannot confirm it without re-reading the whole record herself. At that point the machine has saved her nothing. If you have to redo the work to check the work, the work was never done. An assistant you must fully re-examine is not an assistant.

The quiet error and the loud one

Generative models produce confident errors — fluent, persuasive, wrong. The question that matters is not whether they occur; they will. It is whether anyone can catch them. Without a traceable source, a confident error is invisible. A clinician reads a flawless, mistaken summary and acts on it, and nothing in the text ever signals the fault.

Anchor the same claim to its source and the failure changes character entirely. The physician follows the link, finds that the note does not say what the summary claimed, and discards the line. The error that was a silent risk becomes a visible mismatch — a small friction instead of a quiet harm. Traceability does not make the model more accurate. It makes it accountable, which in medicine is the more important property.

None of this is an argument against black boxes. Refusing them outright would be its own kind of foolishness; the eye that reads the scan has earned its place. It is an argument about where each belongs. When a system offers an opinion on an image or a signal, look at its track record. When it offers an opinion on fact, history, or policy, ask to see its papers.

In the age of generative AI, evidence-based medicine quietly acquires a second meaning: auditability. If I cannot trace an insight back to the data it came from, I cannot responsibly act on it — and as the European AI rules take shape around exactly this question of traceability, being able to show your sources stops being a virtue and becomes the condition of using the tool at all.

#Reflections#Clinical AI#Evidence-Based Medicine#Generative AI#Digital Health

Keep reading

Editorial collage of an oncologist's hands on a thick claims ledger, with a teal three-column bar chart rising only partway and a single amber accent.
Journal Club

An Explainable Model, Honest Numbers, and a Funder Worth Noticing

An explainable AI model predicted how long myeloma patients would stay on treatment, using twenty years of Japanese claims data and 647 variables. The discrimination is modest and fairly reported. The part that needs a careful eye is who paid, and which finding they got.

Dr. Sven JungmannCEO
Editorial collage of four people mid-conversation arranged around a teal circle with a single amber dot at its centre.
Journal Club

Four Conversations About Clinical AI That Quietly Agree

Four NEJM AI podcast interviews, recorded months apart, keep landing in the same three places: a values vacuum, a bias we taught the machine, and a trust gap that tracks consequence. None of it is evidence. The agreement is still worth an hour.

Dr. Sven JungmannCEO
Editorial collage of a surgeon's gloved hands beside an anaesthesia monitor showing a teal arterial-pressure waveform, with a closed operating-room door suggested behind and a single amber accent.
Journal Club

Surgical AI That Works in the Paper but Not in the Room

A scoping review screened 275 records to find every AI model meant to prevent surgical complications and follow it to the bedside. Of 19 studies, the models were often accurate. Two are in routine use — and the bottleneck is not the algorithm.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.