Skip to main content
Journal Club5 min read

A Translation App in the Clinic: The Pilot Worked; Availability Didn't

A feasibility pilot put volunteer medical translators one video call away from the bedside. Over two months it logged 39 requests and connected on 16 of them. The honest finding is in the 23 that went unanswered — and in what the study never measured.

Dr. Sven Jungmann

Dr. Sven Jungmann

CEO

Editorial collage of a clinician and patient with a smartphone showing an empty video-call screen, an unanswered speech bubble above, and a single amber accent.

Twenty-three calls went unanswered. That is the number to start with, because it is the one a clinician feels in the body. A translation request goes out from a ward, and nobody picks up — not out of negligence, but because at that hour, in that language, no one is free. The phone in your hand is connected to a network of willing volunteers, and the patient in front of you still cannot understand a word you are saying. A small German pilot study, published in JMIR Formative Research in March 2025, logged exactly this 23 times over two months. It also tells us something useful — and it is worth being precise about what.

The platform is Translatly, which links a clinician to a volunteer translator over a video call. The logic is hard to argue with: a university hospital already holds the people you need — medical students, staff who grew up speaking the languages your patients speak and who know the vocabulary of a consultation. The question is whether you can route that latent capacity to the bedside on demand. And the first thing to fix in the reader's mind is the evidence tier. This is a formative feasibility pilot — the earliest rung on the ladder. It is not designed to show the platform works. It is designed to find out whether it can be run at all, and to surface what breaks first.

The design, in two parts

Before the pilot, the authors did ethnographic interviews with ten healthcare professionals across Frankfurt am Main, Offenbach and Düsseldorf, plus one in the United States, to map how language barriers are bridged today. Half used a mix of their own staff and patients' relatives; nine of the ten used no formal interpreting service at all; eight of the ten said they would prefer an on-demand service staffed by translators who actually know medicine. That is the unmet need the platform is built against, and the interviews establish it credibly.

The pilot itself ran at Goethe University Hospital in Frankfurt from December 2022 to January 2023. By then 170 volunteers had registered — 153 medical students and 17 members of staff — offering more than twenty languages between them. The endpoint was deliberately modest and entirely operational: how many translation requests came in, and how many got answered. No clinical outcome, no quality score; just whether the service functioned.

What it fairly shows

Thirty-nine requests over the two months. Sixteen connected — 41 percent — using six languages, handled by the ten translators who were actually active, for 209 minutes of conversation in total. Infectious diseases and the emergency department generated most of the demand. Read at its proper strength, this is a genuine result: you can install the thing in a working hospital, clinicians will reach for it under real pressure, and a volunteer network can be assembled and made to respond. For a feasibility pilot, proving the apparatus runs at all is the entire assignment, and it cleared that bar.

Keep the denominator in plain sight, though. Forty-one percent is a fraction of thirty-nine, across eight weeks, at a single site. That is the texture of a first look, not the measurement of an effect. It tells you the pipes connect. It says nothing about how much water moves through them once the load is real.

Two things it cannot

The first is availability, and the authors are admirably direct about it: 59 percent of requests — those 23 calls — went unanswered, almost always because no translator happened to be free at the moment of need. A volunteer network lives or dies by its coverage of the inconvenient hours, and a service that misses three calls in five is not yet a service a clinician can build a plan around. The paper treats this as the engineering problem the next phase must solve, not as a verdict on the idea. That is the right posture.

The second matters more, and it is quieter. The study graded the quality of not a single translation. Whether a volunteer rendered the oncologist's sentence faithfully — the precise thing that goes wrong when a frightened relative softens bad news — was never assessed. Neither were patient outcomes nor safety. So the pilot answers "can we reach a translator?" and leaves wide open the question that decides everything at the bedside: "is what the patient now hears actually true?" Availability is logistics. Fidelity is medicine. This design could see the first and was never built to see the second.

The pilot shows the pipes connect. It does not yet show that the right words come out the other end.

There is also a disclosed conflict of interest, and the authors handle it openly. Two of them own Translatly UG; two more were paid by the company for the app's development. None of that makes the numbers wrong. It does mean that a first favourable feasibility report on a product, co-written by its owners, is exactly the kind of finding that calls for independent replication — at a site with nothing riding on the result, and ideally with a design that scores the translations themselves.

The reader's takeaway

The problem underneath is universal and durable: hospitals everywhere treat more languages than they can staff, and the default fallbacks — a relative, or a general-purpose machine translator — turn riskiest exactly when the stakes climb highest. Mobilising medically literate volunteers is a genuinely good idea and deserves development. But the transferable lesson of this paper is about how to read evidence, not about this one app. A pilot that connects on 16 of 39 calls, never grades a translation, and is co-authored by the vendor is a beginning — a well-argued case for a properly powered, independently run, quality-measured next study. It is not yet a reason to trust software with the sentence that tells a woman her tumour cannot be operated on.

Source: Olsavszky V, Bazari M, Ben Dai T, et al. Digital Translation Platform (Translatly) to Overcome Communication Barriers in Clinical Care: Pilot Study. JMIR Formative Research 2025;9:e63095. A formative, single-centre feasibility pilot (39 requests over two months at Goethe University Hospital, Frankfurt) that measured connection rates but not translation quality or patient outcomes, and was co-authored by the platform's owners.

#Journal Club#Digital Health#Health Equity#Evidence-Based Medicine#Feasibility Studies

Keep reading

Editorial collage of an oncologist's hands on a thick claims ledger, with a teal three-column bar chart rising only partway and a single amber accent.
Journal Club

An Explainable Model, Honest Numbers, and a Funder Worth Noticing

An explainable AI model predicted how long myeloma patients would stay on treatment, using twenty years of Japanese claims data and 647 variables. The discrimination is modest and fairly reported. The part that needs a careful eye is who paid, and which finding they got.

Dr. Sven JungmannCEO
Editorial collage of four people mid-conversation arranged around a teal circle with a single amber dot at its centre.
Journal Club

Four Conversations About Clinical AI That Quietly Agree

Four NEJM AI podcast interviews, recorded months apart, keep landing in the same three places: a values vacuum, a bias we taught the machine, and a trust gap that tracks consequence. None of it is evidence. The agreement is still worth an hour.

Dr. Sven JungmannCEO
Editorial collage of a surgeon's gloved hands beside an anaesthesia monitor showing a teal arterial-pressure waveform, with a closed operating-room door suggested behind and a single amber accent.
Journal Club

Surgical AI That Works in the Paper but Not in the Room

A scoping review screened 275 records to find every AI model meant to prevent surgical complications and follow it to the bedside. Of 19 studies, the models were often accurate. Two are in routine use — and the bottleneck is not the algorithm.

Dr. Sven JungmannCEO

This analysis comes from the people behind Visite.

Our weekly newsletter on AI in medicine. Every Friday, rigorously checked.

By signing up you agree to receive Grand Rounds by email. Unsubscribe anytime. More in our privacy policy.

Want to see this in your hospital?

30 minutes. Your questions. Our physician-founder shows you the platform personally.

Book a demo

No commitment. No sales pitch. Physician to physician.