Journal Club6 June 20265 min read

Four Conversations About Clinical AI That Quietly Agree

Four NEJM AI podcast interviews, recorded months apart, keep landing in the same three places: a values vacuum, a bias we taught the machine, and a trust gap that tracks consequence. None of it is evidence. The agreement is still worth an hour.

Dr. Sven Jungmann

CEO

Editorial collage of four people mid-conversation arranged around a teal circle with a single amber dot at its centre.

Fifty physicians were asked to work through clinical cases. Half had their usual resources; half had those plus GPT-4. A separate arm gave the model the same cases with no physician attached. The model on its own scored roughly sixteen percent above the doctors using conventional tools, and handing the doctors that same model did not reliably lift them (Goh et al., JAMA Network Open 2024). The trial belongs to Jonathan Chen's group at Stanford, and it sits awkwardly against a principle he, like most informaticians, had treated as close to a law: that a clinician working with a computer beats either alone.

That principle has a name — the fundamental theorem of biomedical informatics, Charles Friedman's formulation that a person plus an information resource outperforms the person unassisted. Chen revisits it on the NEJM AI Grand Rounds podcast, and his is one of four interviews I want to read together here. The others are with the cognitive psychologist Laura Zwaan, the informatician Zak Kohane, and Seth Hain of Epic. None of this is a study. There is no shared dataset, no protocol, no peer review of anything said into the microphone. What earns the set an hour is that four people who build and study clinical AI, talking separately across four months, keep arriving at the same three places.

Nobody is checking whose values went in

Kohane puts the sharpest of the three on the table. Models already arrive with dispositions — one hangs back, another reaches for the aggressive workup — and a disposition is not a neutral setting; it is a clinical stance with consequences for the patient in front of it. His observation is that no regulator looks at where those stances come from. Medicines and device authorities ask whether a tool is safe and whether it works. Neither question travels as far as the values a system has absorbed, or whose values they were to begin with. For software that tips a diagnosis or a referral one way rather than another, that is not a minor gap in the oversight.

A bias we trained into it ourselves

Zwaan has spent her career on how clinicians err, and on a subtler problem in how the rest of us study those errors. Once you know how a case ended, you cannot un-know it; reasoning that looked sound at the time reads as negligent in the rear-view mirror. Hindsight bias is less a flaw in error research than a permanent tenant of it. The line to AI is short. A model trained on labelled outcomes is learning from notes written after the ending was known — so it inherits not just our knowledge but the specific way our judgement bends once the answer is in. We then ask it to catch the mistakes we make for exactly that reason.

We move fast where it is cheap to be wrong

The fourth thread is about pace. Administrative AI — coding, billing, the revenue cycle — has slipped into routine use across health systems with little friction. Clinical AI, the kind that reaches a diagnosis or a treatment, has not, and the speakers are clear that the brake is not chiefly a technical one. Hain, describing how Epic approaches this, treats slow and careful deployment for anything that touches the patient as a choice made on purpose rather than a failure of nerve. The asymmetry gives the game away: we hurry where a mistake costs money and we hesitate where it costs a person. That instinct is, broadly, correct. It also means the cases that matter most are the ones still queued.

What this is, and is not

Read the set for what it is. These are the considered views of people with deep stakes in the field — Chen and Hain build the tools, and candour is not the same as disinterest; an interview is not a controlled comparison. Not one of the three claims here would hold up if you cited it as evidence. The honest version is that they are hypotheses, sharpened by people who would know, that happen to point the same way: an oversight blind spot over values, a bias handed down from us, and a deployment gap that follows consequence rather than difficulty.

“A model trained on labelled outcomes inherits not just our knowledge but the specific way our judgement bends once the answer is in. We then ask it to catch the mistakes we make for exactly that reason.”

For European decision-makers the practical lesson is modest and real. What will decide whether clinical AI earns its place is not a benchmark score. It is whose values a system encodes, how its training data was labelled and by whom, and whether the caution we rightly keep at the bedside is matched by the scrutiny we apply before the tool ever arrives there. Four people who disagree about a great deal agree about that much. The agreement is worth taking seriously — and worth not mistaking for proof.

Source: NEJM AI Grand Rounds, interviews with Jonathan Chen (15 Oct 2025), Laura Zwaan (19 Nov 2025), Zak Kohane (17 Dec 2025) and Seth Hain of Epic (18 Feb 2026), hosted by Arjun Manrai and Andrew Beam. The diagnostic-reasoning result is from Goh et al., JAMA Network Open 2024. The interviews are recorded conversations, not peer-reviewed research: the views are individual, several speakers build the systems they discuss, and nothing said in them should be read as primary evidence.

#Journal Club#Clinical AI#AI Governance#Diagnostic Error#Medical Informatics

Four Conversations About Clinical AI That Quietly Agree

Nobody is checking whose values went in

A bias we trained into it ourselves

We move fast where it is cheap to be wrong

What this is, and is not

Keep reading

Automation Bias at the Bedside: Why Edit Rates Near Zero Are a Warning Sign

Why aiomics for QM reports and quality analytics

Why aiomics for coding suggestions and §301 preparation

This analysis comes from the people behind Visite.

Want to see this in your hospital?