Skip to main content
Journal Club5 min read

ChatGPT Corrected the Vaccine Myths and Missed the Climate Ones. The Gap Is the Finding.

An experiment had 149 students fact-check health myths with ChatGPT. Misconceptions about flu vaccination fell measurably; those about climate change did not move at all. The asymmetry tells you more than either result alone — and the evidence is thin.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of a person talking to a phone whose screen is a teal rectangle, with one message arc landing and another dissolving, marked by a single amber dot.

Tell ChatGPT a flu-vaccine myth and, in nearly six conversations out of ten, it will tell you to check with your doctor. Tell it a climate myth and it does that less than one time in ten — but it cites a scientific consensus in more than nine. Same model, two topics, two completely different conversational reflexes. A recent experiment captured this in numbers, and the more you sit with the numbers, the more the differences in how the model talks turn out to explain the differences in what it changes.

The study, by communication researchers Lu, Wang, Liu and McLeod, appeared in JMIR AI in February 2026. It speaks to a question that now turns up in the consultation room most weeks: when a patient says they "asked ChatGPT", what exactly came back?

The design, and its two soft spots

This was a pre-post experiment with 149 undergraduate communication students at a large midwestern US university (217 began; 149 finished). Each filled in a questionnaire, then held structured conversations with ChatGPT about flu-vaccination and climate-change misinformation, then filled in the questionnaire again. The 298 resulting transcripts — two per person — were coded against five communication strategies from the misinformation-correction literature: coherence appeals (explaining why a claim is false), consensus appeals (invoking expert agreement), credibility appeals (naming authoritative bodies), verification appeals (telling the user to check elsewhere) and empathy appeals (acknowledging the user's concern). Coding paired an automated coder (GPT-4o) with a human one, with a second coder resolving disagreements; agreement ran between 88 and 94 percent, high enough to trust the counts.

Two facts limit how far the results travel. There is no separate control group, so a before-and-after shift is a candidate cause, not a proven one. And the participants are a convenience sample — young, educated, digitally fluent, at a single campus. The authors flag both. A nice detail underlines the first: the study also varied whether participants were told ChatGPT was a high- or low-credibility source, and that manipulation changed nothing. When an embedded comparison moves the needle by zero, it is a reminder of how easily an uncontrolled pre-post number can flatter itself.

Where the conversations diverge

Coherence appeals appeared in all 298 transcripts; on the bare mechanics of explaining a falsehood, the model treats both topics alike. Everything else splits. On flu vaccination, verification appeals — "talk to your doctor" — showed up in 59.1 percent of conversations; on climate change, in 9.4 percent. Empathy appeals: 51.7 percent for vaccination, 6.0 percent for climate. The mirror image holds for institutional authority: consensus appeals reached 91.9 percent on climate against 43.6 percent on vaccination, and credibility appeals 60.4 against 38.9 percent. Faced with a personal, actionable health decision the model reaches for the interpersonal register; faced with a systemic, politicised one it reaches for institutions and expert consensus.

The belief data hold the surprise. Factual misconceptions fell for vaccination by a moderate margin (Cohen's d = -0.56, P < .001) and not at all for climate (d = -0.01, P = .94 — statistically, nothing). Yet attitudes moved identically on both: more favourable toward flu vaccination and toward climate action alike, d = 0.41 each, P < .001. The conversation could warm people to the climate cause while leaving every false thing they believed about it exactly where it was.

The conversation could warm people to the climate cause while leaving every false thing they believed about it exactly where it was.

What this does not license

The headline reading — "ChatGPT corrects health myths" — overshoots on three counts. Without a control arm, the design cannot separate the chatbot from test-retest drift or the pull of answering the same survey twice in one sitting. Participants used GPT-3.5 or GPT-4 by their own free-or-paid access, and the results are not broken out by model, so the effect sizes average over systems known to behave differently. And 149 communication students on one campus are not the US public, let alone a German general practice. The numbers describe this room.

The narrower reading is the more useful one. The attitude-versus-misconception split suggests that on identity-laden topics a conversational model can shift sentiment while the underlying false belief sits untouched. In clinical communication that gap is not academic: a patient who feels warmer toward vaccination but still holds the misconception is in a different place from one whose belief has actually changed — and only the second reliably predicts what they do next.

Why it matters

Patients are already carrying these conversations into the room, and this is a first, careful look at what is inside them. The reassuring half: on a routine vaccination question the model leaned on coherence and consensus and sent users back to a clinician — moves that match evidence-based health communication. The cautionary half: quality is topic-dependent in ways we do not yet understand, and shifting a feeling is not the same as correcting a fact. That changes how you might answer when someone opens with "I asked ChatGPT" — treating it less as a verdict to ratify or rebut than as a starting point whose strengths and blind spots vary by subject.

Source: Lu L, Wang YS, Liu J, McLeod DM. Human-Generative AI Interactions and Their Effects on Beliefs About Health Issues: Content Analysis and Experiment. JMIR AI 2026;5:e80270. A single-site, pre-post experiment without a separate control group, on a convenience sample of 149 students; it measures short-term belief change in one room, not durable correction in a population. The authors report no external funding and no competing interests.

#Journal Club#Health Communication#Generative AI#Misinformation#Evidence-Based Medicine

Keep reading

Editorial collage of a confident stack of clinical document fragments bound by a teal bracket that stops at a closed ward door, with a single amber accent.
Journal Club

Sixty-Five Studies Agree the Models Win. The Ward Hasn't Noticed.

A PRISMA review of 65 studies finds language models consistently beat classical methods at classifying clinical text. The honest reading is narrower: it is a synthesis of single-site accuracy studies that mostly never asked whether the models work at the bedside.

Dr. Sven JungmannCEO
Editorial collage of a clinical summary sheet torn down the middle, one half framed by a teal speech bubble and the other by a navy clipboard, with a single amber dot on the tear line.
Journal Club

Two Readers, One Summary: Who Should Grade Patient-Facing AI?

A small Stanford study had clinicians and parents rate the same AI-written clinical summaries. They disagreed, significantly — and that disagreement, not the scores, is the finding worth keeping.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.