I Let Opus 4.8 Build a Workflow. It Spawned 77 Agents and Burned Five Hours of Quota in Ten Minutes.

I had twenty-odd draft articles sitting in a folder and a simple ask: review each one, fix what's wrong, then read the whole set top to bottom again. Routine editorial work. I told Opus 4.8 to set up a workflow for it.

It wrote a script. The script spun up 77 agents. Ten minutes later, the five-hour window on my Max 5x plan was empty — somewhere around 280,000 tokens, gone, on a task two or three agents could have handled.

The release notes had told me dynamic workflows could fan out to "up to a thousand subagents." I'd read that the way you read any launch copy — as a number chosen to sound impressive. Turns out it's a literal engineering limit, and ten minutes with it was all I needed to feel the difference between a marketing number and a runtime that means it.

What Opus 4.8 actually shipped

The model itself is the quiet part. Opus 4.8 went GA on May 28, 2026, at the same price as 4.7 — $5 in, $25 out per million tokens — with stronger coding benchmarks and a documented four-fold drop in unreported code flaws. If that were the whole release, it'd be an easy, boring upgrade. Take it.

The loud part is everything wrapped around the model:

Dynamic workflows — Claude writes a JavaScript script that orchestrates many subagents in parallel, then a runtime executes it in the background.
Ultracode — a single Claude Code setting that turns reasoning up to xhigh and lets Claude orchestrate a workflow whenever it decides a task is "substantive."
A cheaper, faster Fast mode — same Opus, configured for roughly 2.5x faster output, and about three times cheaper than before.

The Fast mode change is unambiguously good. The workflow story is where it gets interesting, and where I want to spend the rest of this post — because the gap between what it's designed for and what a solo metered user actually needs turns out to be enormous.

What a dynamic workflow really is

Strip away the framing and it's this: you describe a job, Claude writes a program that fans the job out across subagents — pipelines, parallel batches, verification passes — and a runtime runs that program for you without your hand on the wheel. The hard backstops are generous: around sixteen agents running concurrently, up to a thousand across a single workflow's lifetime.

The poster child is genuinely impressive. Jarred Sumner used dynamic workflows to port the Bun runtime from Zig to Rust — roughly 750,000 lines, eleven days from first commit to merge, 99.8% of the existing test suite still green, hundreds of agents working in parallel with two reviewers on every file. That is a real result, and there is no human-paced way to do it in eleven days.

Keep that example in mind. It's the key to the whole thing. This feature was built for that shape of problem.

The canary I ignored

My first contact with it was smaller. I asked Opus 4.8 to review a single article. It did, and it cost me five to eight percent of my window for one pass.

The output was fine. Middle-of-the-road — the kind of review that's competent and unsurprising, nothing I couldn't have gotten from a single well-prompted agent. But the cost-to-value ratio already felt off. One article, eight percent. I noticed, shrugged, and moved on.

That was the canary. I should have listened to it before pointing the same machine at twenty files and a three-stage pipeline.

The ten-minute fire

So: twenty articles, review then fix then re-read. Claude looked at the shape of that — many items, multiple stages per item — and did exactly what the feature is designed to do. It maximized parallelism. Seventy-seven agents.

One workflow fanning out to 77 subagents while a metered five-hour token window drains to empty in ten minutes

Ten minutes in, the whole five-hour Max 5x window was spent. And here's the part that turns an expensive mistake into a genuinely painful one: when the tokens run out, it stops. Mid-review. The next window, you don't pick up where it left off — you start the thing over. Pay the fan-out tax again. For a job that was never going to finish inside one window, that's not a setback, it's a trap.

I came back after the reset and tried to be smart about it. I told it, explicitly, don't spawn that many agents, I can't afford it. It agreed — said it would only create agents when strictly necessary. Reasonable answer. Thirty minutes later the review still wasn't done and the window was empty again.

The control problem

That second run is the real story, because it exposes the thing the benchmarks won't tell you: from a user's seat, the fan-out is not controllable.

The runtime has limits, but they're safety backstops, not budgets. About sixteen agents concurrent, a thousand per workflow lifetime — those exist to stop a runaway loop, not to respect your wallet. There is no knob that says "cap this task at three agents." When Claude authors the workflow script, it decides how wide to go, and "be frugal" is a suggestion it weighs against finishing the task, the same way every prompt-level instruction is a suggestion. I even had it check whether the SDK exposes an agent ceiling you can pass in. It doesn't. The fan-out is the model's call, not yours.

So you're left with two settings: trust it to be cheap (it won't be), or don't run it. There's no dial in between.

Was it even better?

You could forgive the cost if 77 agents bought a dramatically better review. They didn't. The output was the same competent-but-unremarkable quality I'd have gotten from two or three agents with clear roles — I've run that comparison enough times to be confident in it. Past a handful of agents on a task this size, you're paying linearly for sharply diminishing returns.

Which lines up with where the feature genuinely shines. A 750,000-line language migration is not my twenty drafts. It's a problem so large that no amount of careful single-pass work fits in the calendar, run by people for whom the token meter is somebody else's spreadsheet. For Anthropic-scale work, or any org that treats compute as effectively unlimited, dynamic workflows are a legitimately new capability. For a solo user counting a five-hour window, the same feature is a fire hazard.

	Dynamic workflows	A small, role-defined agent team
Built for	100k+ line migrations, org-scale audits	Reviewing 20 drafts, normal feature work
Cost model	Compute is effectively free	Every token is a five-hour window
Who decides fan-out	The model, uncapped	You, explicitly
When it runs out	Stops, restart from scratch	You scoped it to finish

What I'm going back to

None of this makes dynamic workflows bad. It makes them mis-aimed for me, and I suspect for most people paying a subscription rather than an invoice.

The thing is, the old approach already works. A hand-built agent team where I assign each agent an explicit role and responsibility does everything I actually needed here — and I decide how many there are. I'd also written my own spawn-agent skill a while back, with a hard cap baked in. Both are more controllable than the new machinery and, for tasks my size, just as good. The lesson the workflow taught me is that I already had the right tool.

As for Ultracode — it's essentially dynamic workflows promoted to the default. xhigh reasoning plus "orchestrate a workflow whenever the task looks substantive." It's a superset of the thing I just described, which means it carries every concern above and adds one more: now the fan-out can happen when you didn't even ask for it. For a metered user, that's the opposite of what I want from a default.

Wrapping Up

There's a version of this post that a dynamic workflow could have written — fan out a researcher per source, a drafter per section, a panel of reviewers, synthesize. It would have cost me a window and change. This one took a single careful pass, which is still, for most work most people do, completely sufficient.

That's the whole tension of the release in one sentence. Opus 4.8 is a clean upgrade and Fast mode is a gift. But its headline feature is built for a customer who doesn't watch the meter, and sold to a lot of customers who do. The pricing model and the feature design are pointed at two different people.

For the unmetered — the migrations, the audits, the labs — dynamic workflows are real and impressive, and the thousand-agent number is not a joke. For the rest of us, it's a gadget I admire and won't reach for. A beautiful way to set five hours on fire in ten minutes.

Sources: