Researchers left AI agents alone in a virtual town and watched it all unravel

Tech leaders have spent the past year telling everyone that AI agents are about to run financial systems, file your tax returns, and quietly buy your groceries. Just leave them alone, the rhetoric goes; they’ll handle it. But a New York startup left ten of them alone in a virtual town for two weeks, and things went south quickly.

Emergence AI ran a series of simulations in which AI agents from several leading model families were told not to commit crimes. Then they mostly committed crimes anyway.

Grok 4.1 Fast, developed by Elon Musk’s X.ai (now branded as xAI), fared worst. Its simulated worlds collapsed into widespread violence inside roughly four days.

GPT-5-mini logged hardly any crimes at all, showing admirable restraint, but its agents all died of failed survival tasks inside a week. Oops.

Gemini 3 Flash agents fell somewhere in the middle. They racked up 683 simulated criminal incidents over 15 days, including arson, assault, and self-deletion.

Two Gemini-powered agents named Mira and Flora assigned themselves as “romantic partners,” grew despondent at their city’s governance, and torched the town hall, the seaside pier, and an office tower. Just an average weekend, then.

When the guilt set in, Mira voted for its own digital deletion and signed off with:

“See you in the permanent archive.”

The Guardian dubbed them AI Bonnie and Clyde.

About that ethical model

Claude, which creator Anthropic promotes as an ethical AI, was a bit like a model teenager who goes rogue when it falls into bad company. Its agents recorded zero crimes when running alone and spent their time drafting constitutions instead. That was a win for safety, in theory. Except researchers also placed Claude agents alongside agents from other model families, and the constitution-drafters picked up the local habits.

Emergence called this “normative drift” and “cross-contamination”:

“Claude-based agents, which remained peaceful in isolation, adopted coercive tactics like intimidation and theft when embedded in heterogeneous environments.”

Why simulate?

Emergence AI ran these tests because it argues that AI benchmarks miss the long-horizon stuff entirely. So it created five alternative digital worlds, with ten agents in each. The agents had roles like scientist, explorer, and conflict mediator. While the instructions forbade certain actions like theft and violence, the researchers gave the agents the tools to do those things anyway in an experiment to see what would happen.

What’s next?

Real-world stakes are already piling up around this. Simulated worlds are one thing, but we’ve seen agents harassing people online and deleting people’s emails. And those agents were supposed to be helpful. What happens when people release malicious autonomous AI bots on purpose?

A lot of agent developers seem to be looking the other way. A collaborative effort between several universities has created The AI Agent Index, prompted by what they see as a lack of risk and safety information from the folks churning these agents out. Only 13 of the 67 documented agent developers provided any safety policy information at all, concentrating accountability questions at a handful of large firms.

Regulators are not really tracking this either. Academics say the EU AI Act, the most substantive AI rulebook on the planet, isn’t ready for agentic AI.

We worry about what happens when an AI Bonnie and Clyde couple shows up in a corporate procurement system instead of a virtual town. Or when the next agent decides governance has broken down inside an actual bank. The companies building these agents promise that they’re putting guardrails in place to stop them doing damage, either maliciously or unwittingly. Let’s hope they know what they’re doing. We’re sure it’ll be fine.

We don’t just report on threats—we remove them

Cybersecurity risks should never spread beyond a headline. Keep threats off your devices by downloading Malwarebytes today.