Cybersecurity Researchers Push Back on Anthropic’s Strict Fable Guardrails

Jun 10, 2026 · Lorenzo Franceschi-Bicchierai

Cybersecurity researchers are already hitting a wall with Anthropic’s latest model, codenamed Fable.…

Cybersecurity researchers are already hitting a wall with Anthropic’s latest model, codenamed Fable. They say its built in guardrails are so aggressive that the model basically refuses to do any real security work. This isn’t just a minor annoyance. It’s a fundamental block for anyone trying to use AI for penetration testing, threat analysis, or even basic vulnerability scanning.

The problem is simple. Fable was designed to be exceptionally safe, so it errs on the side of caution with almost any request that sounds vaguely dangerous. Researchers report that it will shut down tasks like “scan a network for open ports” or “generate a sample phishing email for awareness training.” The model can’t tell the difference between a malicious attacker and a professional trying to secure a system.

Anthropic hasn’t commented publicly on the backlash yet. But the company has always prioritized safety over flexibility, especially after earlier models like Claude faced their own moderation debates. This time, though, the criticism is coming from people who actually need to use the tool for work, not just theoretical doomsayers.

What’s at stake here is more than a few frustrated engineers. If AI models can’t handle red teaming or basic security operations, the industry loses one of its most promising tools for defending against real attacks. The tension between keeping AI harmless and keeping it useful is heating up fast. For now, researchers say they’ll stick with older models or just patch around Fable’s restrictions. But that’s not a long term fix. Someone will have to decide how much risk is worth taking.

Original source