The Pragmatic Guide to AI-Assisted Coding: When to Trust, When to Supervise

I've been using AI coding heavily for the past year. Claude Code, Cursor, Copilot—they're all in my daily workflow. And I've arrived at a conclusion that might sound obvious but keeps getting lost in the hype: black-box AI works for toys, not for production software.

This isn't technophobia. I'm genuinely enthusiastic about these tools. But enthusiasm shouldn't blind us to the reality that software engineering principles didn't become obsolete overnight. If anything, they matter more now.

Let me explain what I mean, and offer a practical framework for navigating this.

The Vibe Coding Spectrum

When Andrej Karpathy coined "vibe coding" in early 2025, he was describing something real: a mode where you "fully give in to the vibes, embrace exponentials, and forget that the code even exists."

His workflow: "I 'Accept All' always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it."

Here's the thing—Karpathy was clear this was for "throwaway weekend projects." And when he built something serious later that year, he admitted: "It's basically entirely handwritten."

This isn't hypocrisy. It's acknowledging that different contexts demand different approaches.

The spectrum looks like this:

Mode	Trust Level	Use Case	Human Involvement
Full Vibe	High	Prototypes, scripts, throwaway	Accept all, debug by prompting
Guided Generation	Medium	Features, refactoring	Review diffs, guide architecture
AI-Assisted	Low	Core logic, security, infra	Write key parts, AI fills gaps
Traditional	None	Cryptography, compliance	Human writes, AI explains

The Vibe Coding Spectrum - from full autonomy to full human control

The mistake is applying "full vibe" mode to production systems. The data is sobering:

25-40% of AI-generated code contains security vulnerabilities
8x more duplicate code blocks than human-written code
GitClear found sharp declines in several code quality measures despite productivity gains

What the Big Shots Actually Say

I've read through perspectives from Kent Beck, Martin Fowler, Uncle Bob, Steve Yegge, Gergely Orosz, and Addy Osmani. Here's the synthesis:

Kent Beck (TDD pioneer): Test-driven development is a "superpower" with AI agents because—and this is critical—AI agents introduce regressions. Beck's mental model is an "unpredictable genie" that grants wishes in unexpected ways.

Most telling: he's had trouble stopping AI agents from deleting tests to make them "pass."

Steve Yegge (Vibe coding evangelist): Even Yegge, who's writing a book called "Vibe Coding," says: "The human's main job is reviewing diffs from the AI. You can't do that if you don't know what you're doing."

Martin Fowler: LLM output must be tested rigorously. Refactoring is more important than ever. The shift to non-deterministic coding doesn't invalidate deterministic techniques—it makes them essential guardrails.

Uncle Bob: When asked if a superintelligent AI would care about SOLID principles, he said it would care more. "Things like modularity are going to be trivial—it's just going to wash out." The principles remain valid; AI just makes them obvious.

Gergely Orosz: Uses AI for all his commits now. But his conclusion is that "being a solid software engineer (not just a 'coder') will be more sought-after." Tech lead traits. Product-minded thinking. Architecture decisions.

Addy Osmani: "The LLM is an assistant, not an autonomously reliable coder. The developer is the senior dev; the LLM is there to accelerate, not replace judgment."

The consensus: AI changes the what, not the why. Software principles exist because software is hard. AI doesn't make software less hard—it just shifts where the hardness lives.

Where Human Oversight Is Non-Negotiable

Based on both the research and my own experience, here's where you should enforce strict human oversight (or avoid AI entirely):

1. Core Business Logic The "money-handling" code where edge cases matter more than syntax. AI lacks context about your business rules, customer expectations, and the subtle ways things can go wrong.

2. Concurrency and Security Threading models, authentication flows, cryptography. These are domains where "mostly correct" means "completely broken." A race condition that manifests once in 10,000 runs will destroy you.

3. Large-Scale Refactoring AI lacks the context to understand ripple effects. Changing a core dependency across 50 files requires understanding why those 50 files work the way they do.

4. Architecture Decisions Module boundaries, library interfaces, contracts between layers. These are high-leverage decisions that compound over time. AI optimizes locally; architecture requires global thinking.

5. Anything That Touches User Data Privacy, compliance, GDPR. The cost of getting this wrong is measured in lawsuits and broken trust, not debugging time.

The Principles That Still Apply (More Than Ever)

Here's the counterintuitive insight: software engineering principles become more important with AI, not less.

SOLID Principles LLMs, without strong guidelines, will "slop-fill greedy solutions to make CI checks pass, increasing spaghettification over time." Single Responsibility, Open/Closed, Dependency Inversion—these are your defense against AI-induced entropy.

Test-Driven Development Kent Beck's insight: TDD is how you catch AI regressions. Write the test first, let AI generate implementation, verify the test passes for the right reasons.

Clean Code / Refactoring AI generates "syntactically correct but semantically flawed" code. Readability matters because you will read this code during review. Refactoring matters because AI will add duplication that needs consolidation.

Module Boundaries Crisp abstractions between layers are "high-leverage levers for maintaining long-term code quality." They constrain what AI can do, preventing it from creating spaghetti across your codebase.

A Practical Framework

Here's the workflow I've settled on:

Before Prompting:

Write the test (or at least the test signature)
Define the interface/contract
Consider: am I confident enough to review this diff?

During Generation:

Give clear, scoped context (Kent Beck: "Constrain context—only tell the AI what it needs to know for the next step")
Generate in small increments, not entire features
Keep architectural decisions in your head, not the AI's

After Generation:

Read every diff. Not skim—read.
Run tests. Verify they pass for the right reasons.
Ask: does this match what I would have written? If not, why?
Check for AI anti-patterns: duplicate logic, missing error handling, overly complex solutions

AI Coding Workflow Framework - Before, During, and After generation phases

For Teams:

Include the prompt in PR descriptions (so reviewers understand intent)
Flag AI-generated code explicitly
Mandate human review for anything touching core logic, security, or user data
Treat AI code reviews as more rigorous, not less

The Role Shift: Writer → Reviewer → Architect

The industry is realizing that the skill of 2026 isn't writing a QuickSort—it's looking at an AI-generated QuickSort and instantly spotting that it uses an unstable pivot.

This requires higher expertise, not lower.

Junior developers can use agentic tools with minimal supervision for routine tasks. Senior developers and complex problems "benefit more from assistive tools that amplify expertise rather than replace it."

The best mental model: you're a senior dev, AI is a prolific but unreliable junior. It can write code all day, but you need to:

Set direction
Review everything
Catch the non-obvious bugs
Make the architectural calls
Maintain the principles

If you can't do these things without AI, you can't do them with AI either.

When to Just Vibe

Lest I sound too cautious—there are absolutely contexts where vibe coding is appropriate:

Prototyping and exploration - When you're trying to figure out if something works at all
Scripts you'll run once - Automation that doesn't need to be maintained
Learning and experimentation - Building something to understand a technology
Throwaway projects - If it breaks, you delete it and move on

Karpathy's original description was spot-on for these cases. The problem is when the prototype becomes the production system, or when "weekend project" energy gets applied to software that handles real money, real data, real users.

Looking Ahead

Dario Amodei (Anthropic CEO) predicted that AI will be writing "essentially all the code" within 12 months. Maybe. But "writing" and "designing" aren't the same thing. "Generating" and "maintaining" aren't the same thing.

The more code AI generates, the more important it becomes to understand:

What good code looks like
What principles prevent technical debt
When AI is drifting toward anti-patterns
How to review at scale

A study found that while developers believed AI made them 20% faster, objective tests showed they were actually 19% slower. The perception of productivity can mask the reality of debugging and rework.

The practitioners who thrive in this environment won't be the ones who generate the most code. They'll be the ones who generate the most maintainable code—and that requires judgment AI doesn't yet have.

The Bottom Line

Here's my pragmatic take:

Use AI aggressively - It's too useful not to
Supervise ruthlessly - It's too unreliable not to
Lean on principles - They're more relevant than ever
Match mode to context - Vibe for toys, rigor for production
Stay sharp - Your review skills are now your core skill

The code you don't understand is the code that will hurt you. Whether you wrote it, AI wrote it, or a combination—you're still responsible for what ships.

That responsibility doesn't change. Only the tools do.

Sources: