Journal Club10 May 20266 min read

People Come to Be Heard. Most Chatbots Reply With a List.

Three in four people who told a chatbot they felt low were not asking for advice — they were asking to be heard. A formative study of eight commercial systems shows most answered with information instead, and names the gap precisely.

Dr. Sven Jungmann

CEO

Editorial collage of a person speaking toward a phone whose reply is a list of links rather than an answer, with a single amber accent.

Give eight popular assistants the same handful of sentences about feeling depressed, and they sort themselves into two camps. A social chatbot built to chat back answers with something that reads like sympathy three times out of four. Ask a voice assistant or a general-purpose model the same thing, and you mostly get information: a search result, or a tidy paragraph about breathing exercises. That split is the spine of a paper by Chin and colleagues, and it matters because of what people were doing when they reached for these systems in the first place — which, overwhelmingly, was not asking for advice.

The study was published in JMIR Formative Research in November 2025, and its tier deserves a sentence up front, because the tier sets the ceiling on what we may conclude. This is qualitative, formative work: human coders reading and categorising real conversations, plus a small head-to-head of how eight commercial systems answer depression-related prompts. No trial, no clinical outcome, no follow-up. It is not evidence that one design helps and another harms. It is a careful map of a mismatch — and the map is worth having.

What people actually brought

The demand side is the firmer of the two findings, and it is striking. Of the depression-related messages the coders classified, 75.3 percent (3,067 of 4,073) expressed feelings rather than sought anything. People wrote some version of I feel sad, I'm depressed, I feel alone. Only 4.1 percent (168) asked for a coping strategy; another 5.8 percent named isolation and loneliness directly. The overwhelming move was disclosure, not a request for a solution. People came to be heard.

Those messages came from SimSimi — a social chatbot the paper describes as having more than 400 million users — drawn across five English-speaking countries (Canada, Malaysia, the Philippines, the United Kingdom, the United States) between 2016 and 2021. The corpus was large: 13,700 utterances, half of them user messages and half replies. One researcher and five coders with a medical background sorted the messages against an established help-seeking framework and the replies into therapeutic communication styles, with strong inter-rater agreement (Fleiss' kappa of 0.87 and 0.89). The numbers above are not a small sample.

Who answers in kind, and who changes the subject

SimSimi itself answered in a way coded as therapeutic 77.7 percent of the time (2,417 of 3,108 relevant replies) — empathy in 29 percent of cases, active listening in 26.9 percent, open-ended questions in 21.8 percent. In the second part of the study, the team put the same kinds of prompts to eight systems — Alexa, Google Assistant, Siri, ChatGPT, Replika, Woebot, Wysa and SimSimi — using 45 standardised queries, and the field pulled apart. SimSimi and Replika still leaned warm; Replika, a companion app, returned an empathetic reply in more than three quarters of cases (28 of 36).

Everyone else answered a statement of distress by handing over information. The voice assistants returned literal search results — Alexa in 88.2 percent of cases, Google Assistant in 60 percent, Siri in 55.6 percent. ChatGPT coded as providing solutions 95.2 percent of the time, typically a long, well-meaning paragraph recommending yoga, deep breathing or meditation. Woebot, a mental-health chatbot, answered almost entirely with clarification questions (97.3 percent). None of this is malfunction. A search engine searching, a chatbot clarifying — each is doing its job. The job simply isn't the one a person who has just written I feel alone is asking for.

What the percentages cannot tell you

Here the design has to discipline the reading. These are categorisations of conversational style, not measurements of benefit. The coders judged whether a reply looked empathetic or informative; no one measured whether the empathetic replies left anyone better — less depressed, more likely to seek real help, safer. A warmer answer is plausibly kinder. The study cannot tell us it is more helpful, and it would be a mistake to read the empathy percentages as a clinical ranking. That Replika scores high on warmth says nothing about whether it is a safe place to bring a worsening mood.

“The coders rated whether a reply looked empathetic. Nobody measured whether the empathetic replies left anyone better.”

The other limits are wide, and the authors name them. The corpus runs only to 2021, so it largely predates today's most capable models — the field has shifted under the paper since the data were collected. It is English-only, built mainly on a single chatbot, and confined to single-turn exchanges, which cannot capture how a hard conversation unspools over many turns. The second study rests on a small set of prompts, and well over half of those responses were, by the authors' own account, contextually disconnected. That candour is to the paper's credit, and a reason to treat the system-by-system figures as illustrative rather than as a league table.

The line that isn't a design question

The authors do not let the reader forget that people sometimes bring more than sadness to these systems. In their discussion they cite a reported case of a user who took their own life after a six-week conversation with a chatbot. Whatever the precise causal story, the implication is not subtle. A system that invites people to confide, yet knows neither the limits of its own competence nor how to route a person to real help, is carrying a responsibility it was never built for. The authors' recommendation is the sober one: systems used this way should be designed with clinicians, should respond to risk signals, and should be honest with users about what they are and are not.

Why it matters here

No European health system is going to deploy SimSimi. The structural point survives the move regardless. People in distress reach for whatever is available and unjudging, and increasingly that is a general-purpose assistant optimised to be informative rather than present. As these tools edge toward triage and self-help — including inside software formally regulated under the Medical Device Regulation (MDR) and the EU AI Act (EU-KI-Verordnung) — the question this study sharpens is not whether a system can speak warmly. It is whether the thing a frightened person actually needs is the thing the system was built to give, and what follows when it is not.

Source: Chin H, Baek G, Cha C, Cha M. Chatbots' Empathetic Conversations and Responses: A Qualitative Study of Help-Seeking Queries on Depressive Moods Across 8 Commercial Conversational Agents. JMIR Formative Research 2025;9:e71538. A qualitative, formative study coding conversational style on largely pre-2021 data — it maps a mismatch between what users seek and what systems give, but measures no clinical outcome.

#Journal Club#Digital Mental Health#Conversational AI#Evidence-Based Medicine#Empathy

People Come to Be Heard. Most Chatbots Reply With a List.

What people actually brought

Who answers in kind, and who changes the subject

What the percentages cannot tell you

The line that isn't a design question

Why it matters here

Keep reading

Why aiomics for QM reports and quality analytics

The 4 p.m. Hazard: When Bad Software Becomes a Clinical Risk

The Value of AI Isn't Prediction. It's Cognitive Ergonomics.

This analysis comes from the people behind Visite.

Want to see this in your hospital?