News

Knowledge Graphs Boost LLM Accuracy from 65% to 99% in Industrial Ops

· Madhulatha Mandarapu, Sandeep Kunkunuru

Knowledge Graphs Boost LLM Accuracy from 65% to 99% in Industrial Ops

A new paper from researchers behind the AssetOpsBench benchmark takes a different angle on improving AI agents for industrial maintenance.…

A new paper from researchers behind the AssetOpsBench benchmark takes a different angle on improving AI agents for industrial maintenance. Instead of focusing on better LLM orchestration, the team asked a simpler question: what if the data model itself is the weak link?

AssetOpsBench, presented at KDD 2026, found that GPT-4 agents only hit 65% accuracy on 139 industrial maintenance scenarios when working with flat document stores. The new study flips the script. They built a typed knowledge graph as the foundation and routed each query to the best method for answering it. One approach used LLMs just to generate Cypher queries for structured retrieval, boosting the same GPT-4 model from 65% to over 82%. Another method dropped the LLM entirely, using native graph primitives to hit 99% on graph answerable scenarios.

The standout trick is what they call generation-augmented knowledge, or GAK. When the data simply doesn't contain an answer, the agent creates new, provenance tagged graph nodes on the fly. For the 88 failure mode scenarios that the benchmark itself flagged as non deterministic because ten equipment types were missing from the graph, GAK lifted answerability from zero to 82% of scenarios. Every fact the agent invents carries a source:LLM derived tag for auditing.

The broader lesson is striking. For structured operational domains, the data layer matters more than the LLM pipeline. A typed knowledge graph acts as a grounding layer between raw industrial data and LLM reasoning, and the paper's recurring theme is inverted LLM usage: constrain the model to query generation or one shot enrichment, then let the graph execute deterministically. It suggests that smarter data architecture, not smarter prompting, might be the real lever for industrial AI.

Original source