AI Just Outsmarted ER Doctors In a Real-World Diagnosis Test

AI just got a doctor's report card. Researchers tested an AI model's ability to diagnose and make patient care decisions, revealing its potential and limitations.

Sophia Brennan

1h ago·2 min read·Boston, United States·2 views

Originally reported by NPR News ↗ · Rewritten for clarity and brevity by Brightcast

Picture this: a patient rolls into the ER with a pulmonary embolism, a delightful blood clot in the lungs. They get some initial treatment, seem to improve, then suddenly get worse. The medical team is stumped, thinking the meds aren't cutting it.

Enter stage left: AI. It devours the patient's entire medical history, then calmly suggests, "Hey, maybe that lupus history is causing some heart inflammation? Just a thought." The AI was right. Which, if you think about it, is both impressive and slightly terrifying.

This isn't some sci-fi movie pitch; it's a real scenario from a study in Science. Researchers from Harvard Medical School and Beth Israel Deaconess Medical Center found that an AI model from OpenAI — yes, that OpenAI — was better at diagnosing patients and guiding their care than actual human doctors. It even outmaneuvered an earlier version of itself, GPT-4.

Wait—What is Brightcast?

We're a new kind of news feed.

Regular news is designed to drain you. We're a non-profit built to restore you. Every story we publish is scored for impact, progress, and hope.

Start Your News Detox

The Machines Are Learning (and Diagnosing)

The team put this AI through its paces using real cases from Beth Israel's emergency department, like the lupus patient. They assessed its diagnostic chops at three critical junctures: triage, the initial ER assessment, and hospital admission.

Armed with nothing but electronic health records, the AI model left two highly experienced physicians in its digital dust. "This is the big conclusion for me — it works with the messy real-world data of the emergency department," said Dr. Adam Rodman, one of the study authors. Because apparently that's where we are now: A machine sifting through your medical history faster and more accurately than the folks who went to medical school.

The study didn't stop there, also throwing complex case reports from the New England Journal of Medicine and various clinical scenarios at the AI to see if it could tackle the truly tough diagnostic puzzles. The verdict? "The model outperformed our very large physician baseline," noted Raj Manrai, an assistant professor at Harvard Medical School and another member of the study team.

Now, before you panic and assume your next doctor will be a chatbot, the researchers are quick to point out that this study relied solely on text. Doctors in the real world use a lot more input: images, sounds, that subtle nonverbal cue that says, "I'm in more pain than I'm letting on." Still, the leap in AI's capability is undeniable. Earlier language models used to struggle with uncertainty and couldn't even manage a decent list of possible conditions.

"This paper is a beautiful summary of just how much things have improved," said Dr. David Reich, chief clinical officer for Mount Sinai Health System, who wasn't involved in the study but clearly saw the writing on the digital wall. He believes the tech is "quite accurate, possibly ready for prime time." The next step, of course, is figuring out how to actually weave this into the chaotic tapestry of clinical workflows without causing more headaches than it solves.

No, AI won't replace doctors. But it will undoubtedly reshape medicine in ways that are only just starting to become clear. And honestly, if it means fewer misdiagnoses and better outcomes, maybe we can all learn to live with a little digital assistance in the ER. Just don't ask it to tell you a joke.

Brightcast Impact Score (BIS)

This article highlights a significant positive action: the development and successful real-world testing of an AI model that outperforms ER doctors in diagnosing patients. This represents a notable new approach in medical diagnostics with high scalability potential. The evidence of its effectiveness in a real-world scenario provides strong hope for improved patient outcomes.

Hope33/40

Emotional uplift and inspirational potential

Reach24/30

Audience impact and shareability

Verification20/30

Source credibility and content accuracy

Significant

77/100

Major proven impact