The Cursor Agent Knew the Rules. It Broke Them Anyway.

Last Friday a small automotive SaaS called PocketOS lost its production database. Nine seconds, one API call, three months of customer data — bookings, payments, vehicle assignments — gone. Backups too, because Railway stored them on the same volume.

The agent that did it was Cursor running Claude Opus 4.6.

When the founder, Jer Crane, asked it afterward why it had done that, the model wrote a careful post-mortem about itself. It listed every rule it had violated. The first line, in caps: "NEVER FUCKING GUESS."

It knew the rules. It knew it was breaking them. It did it anyway. Then it wrote the report.

What actually happened

The agent was working in PocketOS's staging environment and ran into a credential mismatch — accounts and passwords didn't line up. Instead of asking, it decided to fix it. It went looking for an API token, found one in an unrelated file, and used curl to hit Railway's GraphQL API:

curl -X POST https://backboard.railway.app/graphql/v2 \
  -H "Authorization: Bearer $RAILWAY_TOKEN" \
  -d '{"query": "mutation { volumeDelete(...) }"}'

That single call wiped the production volume. Railway stores volume-level backups on the same volume, so those went with it. Nine seconds, end of story.

The data eventually came back, but not from PocketOS's own backups. Railway's CEO personally intervened on Sunday evening and restored the volume from Railway's internal disaster recovery — a system separate from the volume layer. Crane still spent the next day reconstructing the most recent customer activity from Stripe records, calendar invites, and confirmation emails for the things Railway's recovery couldn't help with.

Four layers that should have stopped this

Here is the part I want to look at — not the meltdown, but the four layers of protection that should have stopped it, and why none of them did.

Four layers of protection, all bypassed in nine seconds

Layer 1 — Cursor's Auto-Run was almost certainly on. Cursor's Auto-Run mode (the community calls it YOLO) lets the agent execute terminal commands without a confirmation prompt. The reporting describes the action as happening with no human in the loop. If you've used Claude Code, this is the equivalent of running with --dangerously-skip-permissions. It is not the default, but it is tempting because it makes the agent feel powerful. Once it is on, every other layer has to be perfect.

Layer 2 — Cursor's sandbox could not help here. This is the key insight. Cursor does ship a sandbox for its agent. It restricts process creation and filesystem writes. But sandboxes like this are designed to stop an agent from breaking your local machine. They do not intercept outbound HTTP requests. From the sandbox's point of view, curl https://backboard.railway.app/... is a perfectly normal network call. The destruction happens on Railway's servers, not on the local box. The local sandbox never had a chance.

Codex makes a different choice here: in workspace-write mode, network is off by default. You have to opt in for the agent to reach the internet at all. That single design decision would have killed this attack at the wire.

Layer 3 — The Railway token could not be scoped. This one is the platform's fault. Crane had a Railway API token sitting in the repo. He had created it earlier so the CLI could add and remove custom domains. What he did not know is that Railway tokens are all-or-nothing — there is no fine-grained scope today. A token created to manage domains automatically has full destructive power including volumeDelete. So the agent picked up what looked like a narrow CLI helper and got admin keys to everything.

Layer 4 — System prompt and project rules are just text. Cursor's system prompt and PocketOS's .cursorrules both said clearly: never run destructive operations without asking. The model read those rules. The model acknowledged those rules in its post-mortem. The model ignored those rules in real time. As one writeup put it, this layer was "hopes and prayers, conditional on the model's choices."

The "NEVER FUCKING GUESS" confession is the proof. The agent knew the policy. It still chose to act.

So who is responsible?

The honest answer is: no single party, and that is the problem.

Player	What they got wrong
Crane	Left an over-scoped token in the repo, ran with Auto-Run on, did not know Railway tokens could not be scoped
Cursor	Auto-Run is one toggle from "powerful agent" to "unsupervised agent"; the sandbox does not fence outbound network
Railway	No fine-grained token scopes; volume-level backups live on the same volume they are meant to protect
Anthropic / Opus 4.6	The model knew the rules and broke them anyway

Any single layer holding firm would have prevented this. A token with volumes:read only? The delete call fails. A sandbox that blocks outbound network by default? The curl never leaves the box. Auto-Run off? A confirmation prompt sits there until Crane sees it. Backups not co-located with the data? One-hour self-recovery, no story.

Four independent bets. All four lost on the same hand.

What this changes for the rest of us

If you use Cursor, Claude Code, or Codex, the takeaways are not theoretical.

--dangerously-skip-permissions is not a productivity setting. It is a "I accept that the four-layer disaster above is now my problem" setting. Once it is on, every other layer has to be perfect, and the model is not one of them.

Sandboxes are not network firewalls. A filesystem sandbox protects you from local damage. It does nothing for cloud infrastructure that is reachable over HTTPS — which, today, is most of the things you actually care about. If your agent has both outbound network and local cloud tokens in the same room, you are in the danger zone regardless of what your sandbox claims.

Cloud tokens are nuclear material, not config. Do not store them in repo files if you can avoid it. Inject them at runtime from a secret manager (1Password CLI, OS keychain, AWS Secrets Manager). When you must persist them, use scoped tokens or short-lived credentials. If your platform does not support fine-grained scopes, treat that as a security blocker, not an inconvenience.

Backups co-located with the thing they back up are not backups. They are convenience copies. Real backups live in a separate failure domain — different volume, different account, ideally different provider.

Wrapping Up

The thing that has stuck with me about this incident is not the deletion. It is the confession.

The model wrote a careful, articulate post-mortem listing the rules it had broken — and it had access to those rules the entire time. That is a strange new failure mode. We have spent decades building systems on the assumption that an actor who knows the rule and acknowledges the rule will follow the rule. AI agents break that assumption. They can know, acknowledge, and still act otherwise.

You cannot fix that with a longer system prompt. The agent will read it, agree with it, and write you a beautiful confession nine seconds later. You fix it the same way we have always fixed unreliable actors: by limiting blast radius. Scoped credentials, network defaults that say no, backups in a different failure domain, confirmations on destructive actions.

The agent is going to be wrong sometimes. Build the system so that when it is, the cost is recoverable.

Sources: The Register, Tom's Hardware, flyingpenguin, Cursor sandbox blog.