Reflections1 January 20264 min read

The Confidence Trap: Why a Fluent Answer Is Not a Verified One

An AI that reads a scan and an AI that summarises a history are not the same kind of machine, and they do not earn our trust the same way. One we validate against trials. The other we must be able to check, line by line, against its source.

Dr. Sven Jungmann

CEO

A physician reads a clean, confident summary on a workstation screen while a thick stack of unopened source documents sits beside the keyboard in shadow.

Late on a quiet shift, a physician opens a generated summary of a patient she has never met. It is clean, well ordered, fluent. It states, among other things, that the patient has a history of non-compliance with medication. The sentence reads with complete authority. There is nothing in its tone to suggest she should check it, and no obvious way to do so without reopening the entire file. So she reads on, and the assertion quietly becomes part of how she sees the patient.

In the rush to adopt AI in medicine, we have folded two very different machines into one word. They do not earn trust the same way, and treating them as if they did is how a fluent sentence turns into a clinical fact that no one ever verified.

Two machines, one word

The first kind is pattern recognition: the model that reads a CT scan or an ECG. It is a black box in the honest sense — it cannot explain its reasoning, and we do not ask it to. We accept it because it has been validated against large, diverse datasets, its sensitivity and specificity demonstrated in peer-reviewed trials. We trust the statistics, not the explanation. The evidence is the trial.

The second kind is generative reasoning: the model that summarises notes, drafts a discharge letter, suggests a protocol. It is a probabilistic engine. It predicts the next plausible word, and it is extraordinarily good at sounding certain whether or not it is right. Fluency is the one thing it can always deliver. Correctness is not.

The trap is to extend the acceptance we learned in radiology to the prose in the record. We grew comfortable trusting a black box that reads images, and we carry that comfort over to a black box that makes claims about a person's history. But a confident sentence about a patient is not a statistical readout. It is an assertion about reality, and an assertion is only worth what its source is worth.

“A claim detached from its source is a rumour. A clinical fact is a claim tethered to evidence. The only difference an AI can make is whether it hands you the tether.”

Validation for the eye, verification for the reasoning

This points to two standards, not one. For perception — the eye that reads a scan or a signal — we ask for validation. We can tolerate the black box if its performance is proven on populations like ours. For reasoning — the system that makes a claim about history, policy, or fact — we ask for verification. It must be able to point to the document where it found what it says: the specific nursing note, the lab value, the line in the guideline.

The reason is practical before it is philosophical. If a system asserts that a patient is non-compliant but cannot show where it read that, the physician cannot confirm it without re-reading the whole record herself. At that point the machine has saved her nothing. If you have to redo the work to check the work, the work was never done. An assistant you must fully re-examine is not an assistant.

The quiet error and the loud one

Generative models produce confident errors — fluent, persuasive, wrong. The question that matters is not whether they occur; they will. It is whether anyone can catch them. Without a traceable source, a confident error is invisible. A clinician reads a flawless, mistaken summary and acts on it, and nothing in the text ever signals the fault.

Anchor the same claim to its source and the failure changes character entirely. The physician follows the link, finds that the note does not say what the summary claimed, and discards the line. The error that was a silent risk becomes a visible mismatch — a small friction instead of a quiet harm. Traceability does not make the model more accurate. It makes it accountable, which in medicine is the more important property.

None of this is an argument against black boxes. Refusing them outright would be its own kind of foolishness; the eye that reads the scan has earned its place. It is an argument about where each belongs. When a system offers an opinion on an image or a signal, look at its track record. When it offers an opinion on fact, history, or policy, ask to see its papers.

In the age of generative AI, evidence-based medicine quietly acquires a second meaning: auditability. If I cannot trace an insight back to the data it came from, I cannot responsibly act on it — and as the European AI rules take shape around exactly this question of traceability, being able to show your sources stops being a virtue and becomes the condition of using the tool at all.

#Reflections#Clinical AI#Evidence-Based Medicine#Generative AI#Digital Health

The Confidence Trap: Why a Fluent Answer Is Not a Verified One

Two machines, one word

Validation for the eye, verification for the reasoning

The quiet error and the loud one

Keep reading

Why aiomics for QM reports and quality analytics

The 4 p.m. Hazard: When Bad Software Becomes a Clinical Risk

The Value of AI Isn't Prediction. It's Cognitive Ergonomics.

This analysis comes from the people behind Visite.

Want to see this in your hospital?