Five Claudes, Five Implementations. Which One Is Yours?

Run five instances of Claude Code on the same ticket. You get five working implementations. All of them pass tests. All of them solve the problem. None of them look like the code your team would have written six months ago.

That’s the new baseline. The probability space of valid implementations just exploded, and it’s not collapsing back on its own.

AI didn’t make code review obsolete. It made it the bottleneck.

For ten years, generation was the constraint and review was the cleanup. The senior engineer’s afternoon was 30 percent writing code, 70 percent reviewing what other people wrote. The bottleneck on delivery was how fast humans could think.

That ratio just inverted. Generation is effectively free. A solo dev with Claude Code or Cursor can produce more code in a week than a five-person team did a year ago.

The bottleneck moved. It’s now: which of these five working implementations actually belongs in our codebase?

That question doesn’t have a model answer. It has a team answer. The team’s accumulated taste is the only thing that distinguishes “code that works” from “code that belongs here.” How you name things. How you structure modules. What you refuse to import. When you exit early. How you write tests.

Without it, every PR is a coin flip among five reasonable options.

Your taste is the moat. Encode it.

This is where most teams get stuck. The taste exists. It lives in three engineers’ heads. It comes out in PR comments, in Slack threads, in the eye-roll a senior gives a junior on a Tuesday. It’s real, and it’s the actual differentiator between a codebase that compounds value and one that drifts into slop.

But it’s not encoded anywhere a machine can read. Which means agents can’t follow it. New hires can’t follow it. The senior engineers themselves can’t follow it consistently, because they’re tired, on call, or in a meeting when the PR lands.

Most teams already have a CLAUDE.md or .cursorrules. These are instructions for how agents should write code in your repo. Notice the asymmetry. There’s a writing contract and no review contract. CLAUDE.md tells the agent what to produce. There’s no equivalent file telling anyone, human or machine, what’s acceptable to merge.

That file is STANDARDS.MD. Plain markdown, in the repo, versioned with the code. The team’s accumulated decisions, made explicit. Not generic best practices. Your specific opinions about how you build.

None of this is new thinking. Kleppmann’s argument in Designing Data-Intensive Applications is that systems compound value when their contracts are explicit and machine-checkable. Schema, not vibes. STANDARDS.MD is that same discipline one layer up. The contract isn’t between services anymore. It’s between an engineer’s PR and the team’s accumulated taste.

Once that file exists, three things become possible.

Agents generate code against it. Your taste constrains the probability space before code is written.

Every PR gets reviewed against it automatically, whether the author was a human or an agent. No drift, no exceptions, no “we’ll catch that next sprint.”

New hires read one document and absorb the team’s worldview in an afternoon.

The standards file isn’t documentation. It’s the collapse function. It takes the explosion of valid implementations and projects it down to your implementation.

We sped up our dev team 3x by making AI review every PR against our standards.

We hit this exact wall internally three months ago. We built Surmado Code Review for ourselves first. Internal tool. v7. Runs across 14 repos on every commit.

Our standards live in a STANDARDS.MD per repo. Scout reads it, reads the diff, and tells the human reviewer what’s good, what needs work, what to actually look at, and what assumptions the PR is making.

The 3x is time-to-merge across our 14 repos. Our PRs don’t sit. They move.

That speed didn’t come from the bot doing the review. It came from the bot collapsing the noise so the humans could focus on what mattered. The senior engineer stopped catching the same five things in every PR. The junior stopped guessing at conventions. The agent-authored PRs stopped drifting into shapes nobody on the team would have written.

If you don’t have a STANDARDS.MD yet, Scout writes it with you. Two clicks to install. Scout interviews you about your stack, your conventions, your testing posture, and the architectural calls you’ve already made.

Where you have opinions, Scout writes them down. Where you don’t yet, Scout offers defaults you can accept, edit, or override. If you’re a vibe coder who’s never had to articulate your standards before, Scout structures the thinking and surfaces the questions you haven’t thought to ask.

The opinions stay yours. The scaffolding is the one we use internally for our own STANDARDS.MD files. The kind of standards discipline a larger team takes weeks to hash out, you have running this afternoon.

My dev team made me release it. They were tired of telling founder friends that no, you can’t have it, it’s internal. So we productized it.

About your code

We don’t store it. We don’t train on it. We don’t read your whole codebase.

Scout is powered by frontier models from Anthropic and OpenAI. Each PR review sends only the diff to their APIs under standard API agreements. No retention. No training. Nothing about your code persists in our systems after the review lands.

Read the full privacy policy and terms. Everything is documented.

What this is actually for

The pitch most AI tools make is: the model replaces the human. That framing is wrong on first principles, and it’s especially wrong now.

AI generation is making teams faster, but it’s also making codebases drift faster. The accelerant and the entropy come from the same place. The teams pulling ahead are the ones that figured out how to elevate human judgment to keep up with machine speed, not replace it.

Surmado Code Review isn’t an autonomous reviewer. It’s a force multiplier on your team’s taste. Your standards, applied to every PR, every push, every agent run. Humans still do the human work: deciding what the standards are, making the architectural calls, mentoring the team, knowing the business. The bot does the part that was burning your seniors out, which is catching deviations from decisions you already made.

The future of AI in code isn’t agents replacing engineers. It’s engineers’ taste, encoded once, enforced everywhere, at machine speed. The teams that figure this out first are going to look unrecognizable to the teams that didn’t. Not because their devs are smarter. Because their taste is operating across every line of code, including the lines they didn’t write.

Like everything we build at Surmado, this centers the human and empowers the agent. Your taste, amplified.

Try it on your next PR

Surmado Code Review. Two clicks to install. $15/month for 100 PRs. No per-seat pricing. Zero retention on your diffs. Built for solo devs and small teams moving fast. Not for enterprise.

Install Surmado Code Review or read how we built it.

Related Reading:

AI didn’t make code review obsolete. It made it the bottleneck.

Your taste is the moat. Encode it.

We sped up our dev team 3x by making AI review every PR against our standards.

About your code

What this is actually for

Try it on your next PR

Ready to Take Action?

Keep Reading

One Comment Per PR. Edited on Rerun. That's the Whole Idea.

Your AI Keeps Importing Packages That Don't Exist. We Got Tired of It Too.

The GDScript Code Review Checklist