The problem nobody expected
Pathologists have always worked with an implicit assumption: a tissue slide reveals disease, not identity. The microscopic architecture of cancer cells tells the story that matters. Personal details—race, gender, age—shouldn't factor into whether a tumor is present.
Then researchers at Harvard Medical School discovered that AI systems trained to spot cancer on pathology slides were doing something no human pathologist could: extracting demographic information directly from the tissue images. And that invisible knowledge was warping their diagnoses.
"Reading demographics from a pathology slide is thought of as a 'mission impossible' for a human pathologist," said Kun-Hsing Yu, the study's senior author. "So the bias in pathology AI was a surprise to us."
We're a new kind of news feed.
Regular news is designed to drain you. We're a non-profit built to restore you. Every story we publish is scored for impact, progress, and hope.
Start Your News DetoxThe team tested four widely used cancer-detection models on pathology slides from 20 different cancer types. Across all four, the pattern was consistent: diagnostic accuracy dropped for certain groups. The models struggled to identify lung cancer subtypes in African American patients and male patients. They missed breast cancer subtypes more often in younger women. Overall, these disparities showed up in about 29% of the diagnostic tasks analyzed.
Why this happened
The researchers traced the bias to three sources. The most obvious: training data aren't evenly distributed. Some demographic groups are easier to collect samples from, so AI models learn better on those populations and worse on underrepresented ones.
But the problem ran deeper. Even when sample sizes were balanced, performance gaps persisted. The models had learned that certain cancers occur more frequently in certain populations—and they were using that statistical shortcut instead of actually diagnosing the disease. A mutation pattern that flags lung cancer in one population might be rare in another, and the AI had learned to weight it accordingly.
More subtly, the models were detecting molecular differences across demographic groups and using those as diagnostic proxies. "Because AI is so powerful, it can differentiate many obscure biological signals that cannot be detected by standard human evaluation," Yu explained. Over time, the system drifts: it focuses on features linked to demographics rather than disease itself.
The fix that worked
The team developed FAIR-Path, a framework built on a machine-learning technique called contrastive learning. The idea is simple: teach the AI to pay close attention to meaningful differences—like the distinction between cancer types—while actively ignoring irrelevant ones, like who the patient is.
When applied to the tested models, diagnostic disparities dropped by about 88%.
The breakthrough matters because it didn't require perfectly balanced datasets or fully representative training data. It required a deliberate design choice: building fairness into how the model learns, not just what it learns from. "By making this small adjustment, the models can learn robust features that make them more generalizable and fairer across different populations," Yu said.
The team is now working with institutions worldwide to test FAIR-Path across different regions, demographics, and clinical settings. They're also exploring how to adapt the approach for situations with limited training data. The larger goal is understanding how AI bias in pathology contributes to broader health care disparities—and how to prevent it.
That matters because these tools are already moving into clinical practice. Getting fairness right now, before they're embedded in thousands of hospitals, is the difference between a tool that serves everyone equally and one that quietly disadvantages some patients from the moment their slide enters the scanner.










