Detecting Causal Bias in Generative AI Models

May 19, 2026 · Drago Plecko

A new paper from researchers tackles a blind spot in AI fairness: how do we measure bias in generative models like large language models, which work very differently from standard…

A new paper from researchers tackles a blind spot in AI fairness: how do we measure bias in generative models like large language models, which work very differently from standard machine learning systems?

Traditional fairness tools were built for models that learn one specific prediction, like approving a loan. Those tools can trace how bias flows through real-world causal pathways. But generative AI works differently. These models don't just predict one thing. They can generate entire worlds of data, creating their own version of every variable and relationship involved. That means they can import, exaggerate, or even invent new demographic disparities along the way.

The authors formalize this problem for the first time. They create a unified framework that covers both standard ML and generative AI. This lets them break down exactly where bias comes from. Is it baked into the training data? Flowing through a specific causal pathway? Or introduced because the model learned a distorted version of how the world works? Their method can separate these sources of harm and quantify each one.

They also prove their key quantities can be identified from data and build practical estimators to measure them. To show it works, they analyze race and gender bias in large language models across multiple datasets.

This matters because generative AI is already being deployed in hiring, healthcare, and criminal justice. If we cannot measure its unique forms of bias, we cannot fix them. The paper provides a concrete starting point for doing exactly that.

Original source