Architecture

Stop Letting Your Coding Agent Read the Whole Repo

Published June 03, 2026 — 3 min read

TL;DR: Async coding agents like Codex and Devin can eat through a codebase in minutes — but giving them unlimited repo access is how you get hallucinated refactors and silent regressions that cost more to fix than the ticket saved.

Key Insight

The teams winning with async coding agents aren't the ones with the biggest context windows. They're the ones who've built aggressive scoping into how they hand off work.

Context maximalism feels like giving the agent more information to work with. In practice, it gives the agent more surface area to make a wrong decision on. When you hand Codex or Devin a 300k-token repo context and ask it to "fix the auth bug," it will do exactly that — while also noticing the three unrelated patterns that look refactorable, and touching them. Not because the model is broken. Because that's what a capable engineer with full context and no PM would do.

The winning pattern: treat the agent like a contractor brought in for one room. They don't need a key to the whole house.

Why Teams Miss This

Most teams onboard async agents by just pointing them at a repo and writing a ticket. This mirrors how they'd onboard a junior human engineer — give them the full codebase, let them figure it out.

But human junior engineers have implicit social overhead that slows them down: they ask clarifying questions, they worry about scope creep, they check with a senior before touching something unfamiliar. Agents don't. They have capability and no friction.

The result: agents that "work great in demos" but generate PRs that modify 12 files when the ticket only required 2, or that pass all CI tests but silently change behavior in an adjacent module. The Devin post-mortems from Answer.AI are instructive here — it wasn't that Devin couldn't code, it's that no one told Devin which rooms it was allowed to enter.

How to Actually Do It

1. Use an `AGENTS.md` (or `CLAUDE.md`) that defines scope explicitly

Both Codex and Claude Code pick up these hierarchical instruction files automatically. Use them to tell the agent what's out of bounds:

Scope Rules

Only modify files under `src/auth/` and `src/middleware/` for auth tickets
Do NOT touch `src/legacy/` — frozen, requires VP sign-off
Do NOT refactor tests unless the ticket explicitly says so
Commit hooks will fail if you change db migrations without a paired ticket

The Codex team recommends grooming your `AGENTS.md` like a living document — it grows as the model grows, and you can even ask the agent to update it.

2. Scope the file tree at dispatch time

Don't just write a ticket — write a ticket with an explicit file list:

Task: Fix the null-check bug in session expiry

Files in scope: src/auth/session.ts, src/auth/session.test.ts

Files off limits: Everything else

This is more work upfront, but it compresses the blast radius of an incorrect inference. A PR that only touches 2 files is reviewable in 5 minutes. A 12-file PR hides the actual change.

3. Enforce scope at the CI/review layer

Add a PR check that fails if files outside the declared scope were modified. This is your last line of defense when the agent ignores the instruction. Simple implementation: a GitHub Action that diffs the PR file list against the expected scope declared in the ticket body.

4. Prefer modular codebases for agent work

The Codex team said it directly: "a well-named and organized codebase helps Codex navigate the filesystem as well as a brand new engineer might." High-cohesion modules with clear boundaries give agents natural stopping points. A monolithic 4,000-line file invites wide-ranging edits because everything is implicitly in scope.

Cyclomatic complexity also matters here — the Claude Code team specifically cited it as a factor in how well agents perform on a codebase. Simpler functions are easier to fix correctly and easier to verify.

What We've Learned

Pick one async coding agent you're using this week. Before the next ticket, write a 4-line scope declaration at the top: what files are in bounds, what's off limits, and one sentence on the expected delta (e.g., "this should change 1-2 functions, not add new ones"). Measure whether the resulting PR hits that expectation. If it doesn't, tighten the constraint — not the ticket. The model isn't drifting; your scope boundary is.

Sources

Latent Space: ChatGPT Codex: The Missing Manual — Codex team best practices, AGENTS.md grooming, codebase discoverability
Latent Space: Claude Code: Anthropic's Agent in Your Terminal — cyclomatic complexity, Unix philosophy, modular design for agents
Answer.AI: I Spent $500 on Devin — real-world Devin post-mortem on scope and autonomy failures
OpenAI: Introducing ChatGPT Codex — research preview, concurrent task limits, agent form factor