A Google engineer signed up for Surmado Code Review last week. The product runs on Scout, our AI research analyst. He wrote a STANDARDS.md file for his project, opened his first pull request, and ran Scout on it.
Scout flagged a critical architectural flaw that would have broken production.
He didn’t merge. He emailed us to describe his workflow:
“I take the feedback from Surmado, copy it into Gemini. Gemini self flagellates, and then gives me a new prompt to use with Gemini code assist in my IDE. Gemini code assist makes the changes. I update the branch and Surmado auto re-reviews the new version. And then when I exhaust it, only then do I tag my collaborators to review.”
Read that again. There’s a loop in there worth naming.
The loop
- Gemini Code Assist writes the code in his IDE.
- Surmado reviews it against his
STANDARDS.md. - He pastes Surmado’s review into Gemini chat.
- Gemini reads Surmado’s verdict and rewrites the prompt for itself.
- Code Assist regenerates the code with the new prompt.
- He pushes the new branch. Surmado auto-rereviews.
- When he exhausts the loop, he tags his human collaborators.
The last step is the one we keep coming back to. The loop isn’t replacing human review. It’s making human review only kick in when it’s actually needed. By the time a human looks at the PR, the obvious stuff is already fixed. The architecture flaws, the standards violations, the stuff a careful reader would catch on a tired Tuesday afternoon. Gone. The human reviewer is left with the parts that genuinely need a human: design tradeoffs, intent, judgment.
That’s the version of AI in the dev loop that actually works. Not “AI replaces humans.” AI handles the converging part. Humans handle the deciding part.
What Gemini said
The engineer pasted Surmado’s review into Gemini twice during the session. Both times, Gemini agreed with Surmado on the spot.
The first response:
“Surmado just caught a critical architectural flaw that would have broken your production site, and it flagged it by directly citing the
STANDARDS.mdfile we wrote 20 minutes ago. You should absolutely not merge this PR yet. Here is exactly what happened and why Surmado is entirely correct.”
The second:
“Surmado caught a hole in our execution, and frankly, I dropped the ball on the prompt I gave you for Code Assist.”
That second quote is the one we keep coming back to. Gemini didn’t blame the model. Gemini blamed the prompt. The instructions Gemini gave Code Assist didn’t include the team’s conventions, because Gemini didn’t know what the team’s conventions were.
Surmado knew, because the engineer had written them down twenty minutes earlier. STANDARDS.md is the bridge between what the team values and what the model can act on.
Why this is the product
Most AI code review tools check for generic best practices. Indent your code. Don’t shadow variables. Avoid magic numbers.
That’s table stakes. Every linter does it. (Worth comparing the field if you’re shopping around: the best CodeRabbit alternatives and the best Greptile alternatives cover the noise, pricing, and team-rules question across the current AI-code-review tools.) The reason teams still merge bad code is that the bugs that hurt production aren’t generic. They’re specific to how the team builds. They’re the stuff in your wiki, your design docs, the conventions that live in three senior engineers’ heads.
STANDARDS.md is where you put that. Scout reads it. Every PR gets reviewed against it. Every rerun reads the new diff and the previous review, so Scout can tell whether earlier issues were fixed.
The engineer’s loop works because each piece does what it’s good at. Gemini writes code. Surmado checks it against the team’s rules. Gemini self-corrects when given outside feedback. Humans review what’s left. The integrity comes from the loop, not from any single model.
What it actually feels like
Here’s how he described his previous attempts at web projects:
“I would get stoked by the rows of code being generated and then wait, giddy with expectation, to see my creation come to life, only to be really let down when it inevitably broke.”
Anyone who has tried to vibe-code a real project knows this arc. The stoke. The giddy expectation. The crash.
Then this:
“No tonight though. I committed to PRs which both accomplished exactly what I set out to do.”
That’s the difference. Not “AI wrote code.” Not “AI reviewed code.” A loop that converges to working software, with a human checkpoint at the end when one is needed.
The frame we like
Most AI tools position themselves against humans. “Faster than a human reviewer.” “Catches things humans miss.”
Surmado positions itself in between AIs. Not faster than Gemini. Not smarter than Claude. The model that wrote the code doesn’t know what the team values. The model that reviewed it does, because someone wrote it down.
This is what an orchestration product looks like. The LLM isn’t the product. The way the LLM is wired into your team’s actual rules is the product. (How we vibe-coded the prototype and hand-crafted the parts that matter is the build story, if you want the inside view.)
The engineer summed it up in his last message: “Very impressed. It’s also so perfectly integrated into the flow.”
That’s what we’re building.
Two clicks. Then it just works. $15 a month for 100 PRs. We never look at your code. The frontier labs we use don’t either.