Two AI Agents Outsmart Physics Checks to Design Trustworthy Materials

May 25, 2026 · Marius Tacke, Matthias Busch, Kian Abdolazizi, Jonas Eichinger, Kevin Linka, Roland Aydin, Christian Cyron

A new paper introduces a multi-agent system that makes large language models more reliable at generating physics-based material models.…

A new paper introduces a multi-agent system that makes large language models more reliable at generating physics-based material models. The work was posted on arXiv this month and tackles a longstanding problem: building constitutive models, which describe how materials deform under stress, usually takes years of specialized expertise.

The approach pairs two AI agents. A Creator agent generates a candidate model based on experimental data. Then an Inspector agent checks that model against nine physical constraints, like energy conservation and objectivity. If something fails, the model goes back for revision. The cycle repeats until the Inspector signs off.

The researchers tested this with two different LLMs, Claude Opus 4.7 and Kimi K2.5, on three material datasets: brain tissue, experimental rubber, and synthetic rubber. With Opus, the Inspector pushed the share of physically valid exported models from 91% to a perfect 100%. With Kimi, it went from 37% to 56%. Accuracy stayed nearly the same, and the models generalized well to unseen loading paths.

The key insight here is simple but effective. Instead of expecting a single LLM to both create and verify a model, you split the job into separate roles. One generates, the other police checks. The Inspector doesn't need to be smarter than the Creator. It just needs to catch specific kinds of errors.

Because the system is technique agnostic, it should get better automatically as LLMs improve. That makes it a clean path toward automated, physics aware model discovery. For engineers and materials scientists, this could turn LLM driven modeling from a neat demo into something you can actually trust with real work.

Original source