Future Friday

[Future Friday] When to Split a Single Agent Into Many: A Routing Pattern Guide

Published April 10, 2026 — 4 min read

TL;DR: A single complex agent is slower, harder to debug, and more prone to hallucination than a team of specialized agents. Learn the supervisor/router pattern—when to split, what metrics tell you it's time, and how to avoid over-engineering with too many agents.

The Hidden Cost of the Monolithic Agent

When you dump all tasks into one agent, you pay in latency and accuracy. A single agent that handles lead enrichment, qualification, and handoff requires longer prompts, more token overhead, and wider decision trees. Each task competes for the model's attention.

Modern production deployments split responsibility: a qualification agent that scores leads, a research agent that digs into company data, a handoff agent that routes to sales. Each agent has a clear input/output contract, shorter context windows, and fewer failure modes.

The trade-off is coordination. You need a supervisor that routes work to the right agent, passes context cleanly, and handles failures gracefully. Done right, this is cleaner than a single monolithic agent.

The Supervisor/Router Pattern

The supervisor sits at the top. It sees an incoming task and decides: Who handles this?

The supervisor is often thin—a few rules or a lightweight model call that routes to the right specialist. Common routing strategies:

Rule-based: Task type is "lead qualification"? Send to agent-qualify.
Semantic: Embed the task description, find the closest agent in agent-embedding space.
LLM-based: Ask the model (fast and cheap with smaller models like Haiku) which agent to call.

Each specialist agent returns a result. The supervisor collects results and either synthesizes them or passes the best one downstream.

Why this works: Each agent is smaller, faster, and better at one thing. The routing overhead is negligible if you batch requests or cache agent metadata.

Metrics That Say "Split Now"

Before you split, measure these signals:

Token bloat: If your agent prompt is >2000 tokens of instructions, it's doing too much.
Task confusion: Your eval set shows the agent mixing up unrelated tasks (qualification vs. research). That's a sign tasks need isolation.
Latency variance: One task type always takes 3x longer than others. Separating it could unblock the rest.
Hallucination patterns: The agent invents data when it should delegate. A specialist agent would know its boundaries.

If you see one or more of these, routing is cheaper than fine-tuning or prompt engineering.

Three Anti-Patterns to Avoid

Too many agents: 20 micro-agents sounds like clean separation until you're debugging routing failures across them. Start with 3–5. Merge when specialization blurs.

Stateless handoff: When Agent A passes to Agent B, B needs context: what did A try? What failed? Pass full conversation history or a structured summary, not just the task.

No feedback loop: If an agent's output is wrong, you never know which agent failed or why. Log which agent handled each task segment, and tag failures by agent. Use this to retrain or adjust routing.

Building Your First Router

Start minimal. A three-agent system might look like:

Supervisor (GPT-4 or Claude, cheap): receives task, routes to agent A, B, or C.
Agent A (lead qualification): scores and classifies.
Agent B (research): enriches with external data.

The supervisor calls agents sequentially (Agent A output informs Agent B input). Each agent has a tool set relevant to its job.

Test routing accuracy: did the supervisor send the task to the right agent? Run evals on routing decisions before scaling.

When NOT to Split

Too early: If your single agent is still <1000 tokens of prompt, splitting premature. Wait for clear task separation.

Premature optimization: If you're guessing that splitting will help, you're probably wrong. Measure first.

Distributed complexity: Spreading tasks across many agents can hide problems. A single well-tuned agent is often easier to debug than a broken supervisor.

The rule: split when specialization is clear and your metrics show friction.

FAQ

Q: How do I know if my supervisor is routing correctly?
A: Log every routing decision and the agent's result. Compare the supervisor's choice to a ground-truth label (manual review or expected agent). Track precision and recall per route over time.

Q: Can a supervisor agent also do work, or should it only route?
A: It can do both. A supervisor might handle simple tasks directly and route complex ones to specialists. Just keep the supervisor's own logic lightweight so it doesn't become a bottleneck.

Q: What if two agents have overlapping tools?
A: That's a signal to either merge them or split the tool set more clearly. Overlapping tools mean overlapping responsibilities, which defeats the purpose of specialization.

Q: How do I pass context between agents without bloating the prompt?
A: Pass a structured summary (JSON) of prior agent results: what was attempted, what succeeded, what failed. Don't copy the full conversation; extract the essential facts.

Sources

https://python.langchain.com/docs/tutorials/agents/
https://docs.crewai.com/ (multi-agent framework with supervisor patterns)
https://docs.anthropic.com/building-effective-agents/agents-architecture (Anthropic agent architecture guide)
https://www.deeplearning.ai/short-courses/multi-agent-systems/

Next step: If you're running a single complex agent in production, pick one of your tasks (qualification, research, handoff) and create a standalone agent for it. Test routing accuracy against 50 examples before scaling.