Case Study Thursday

Why Your Rule-Based Workflow Migration Is Harder Than It Looks

Published May 14, 2026 — 5 min read

TL;DR: Swapping an if/then workflow for an AI agent sounds like a straightforward upgrade — and that assumption is exactly what kills the project. The teams that succeed migrate incrementally, encode their exception logic explicitly, and prove single-agent value before layering in orchestration.

Key Insight

The enterprise AI narrative in 2026 is dominated by success stories about agents replacing legacy RPA and rules engines. What you don't see: the teams that rewrote their entire approval workflow as a multi-agent system, burned 10x their token budget on circular reasoning loops, and quietly rolled it back three sprints later.

The contrarian take: most rule-based workflows don't need an agent. They need a well-prompted LLM with a strict output schema and a human escalation path. "Agent" is the wrong abstraction for 80% of migration targets. Agents earn their complexity when your workflow has genuine branching uncertainty — not just a long list of rules.

Why Teams Miss This

The common mistake is conflating volume with complexity. A workflow that processes 10,000 invoices per day sounds like a perfect agent target. But if 95% of those invoices follow three deterministic rules, you don't need an agent — you need a classifier feeding a rules engine, with an agent handling only the 5% that are genuinely ambiguous.

Teams skip this triage step and design for the exception case as if it were the norm. The result: a multi-agent system burning $2–$5 per task for work that should cost $0.10.

In production deployments using multi-agent frameworks like AutoGen and LangGraph, one of the most common failure modes is circular decision-making — agents debating the same state without converging. A $0.10 document classification task becomes a $10 token spiral. That's not an edge case; it's the default behavior when you give agents too much conversational latitude on structured problems.

The second miss is context degradation. Long-running agent sessions (30–45 minutes of work) see behavior drift as context windows fill. Teams building rules-to-agent migrations often assume the agent will "remember" the workflow state cleanly — but without explicit state checkpointing, the agent's effective memory of early decisions degrades under token pressure.

How to Actually Do It

Step 1 — Triage before you migrate.

Map every branch in your existing workflow. Label each: deterministic (same input always produces same output) vs. judgment (requires contextual reasoning). Only the judgment branches belong in an agent.

Workflow audit categories:

Deterministic (>95% predictable) → keep in rules engine or simple classifier
Structured ambiguity (<5%, well-defined edge cases) → LLM with constrained output schema
Genuine uncertainty (open-ended, context-dependent) → agent with tool use

Step 2 — Prove the single-agent path first.

Before you build an orchestrator, prove that a single agent can handle one judgment branch end-to-end. Measure: latency, cost per call, error rate, and hallucination frequency on tool calls. Multi-agent architectures multiply these metrics — if single-agent isn't clean, orchestration makes it worse.

Step 3 — Encode exception thresholds explicitly.

Don't let the agent infer when to escalate. Hard-code it: "If confidence score < 0.80, route to human review." "If tool call count exceeds 5 without resolution, abort and log." Agents that lack explicit circuit breakers will keep trying — and keep spending.

MAX_TOOL_CALLS = 5

def run_agent_with_guard(task, tools):

call_count = 0

while not task.resolved:

if call_count >= MAX_TOOL_CALLS:

return escalate_to_human(task)

result = agent.step(task, tools)

call_count += 1

return result

Step 4 — Checkpoint state, not conversation.

Rather than feeding the full conversation history into each agent turn, checkpoint structured state (current step, decisions made, flags raised) and summarize completed phases. This kills context degradation at the root.

Step 5 — Shadow run before cutover.

Run the agent in shadow mode alongside the existing workflow for two weeks. Compare outputs on the same inputs. Flag every divergence. This builds the test suite you'll need for monitoring in production — and it catches tool hallucination before it hits real data.

What We've Learned

The teams winning this migration in 2026 aren't the ones who moved the fastest — they're the ones who were most ruthless about what doesn't need an agent. Governance-first deployments (audit trails, permission scopes, exception routing baked in at day one) are the ones expanding scope in H2. Capability-first deployments are the ones doing post-mortems.

Next experiment: if you have a rules-based workflow in production, run the triage audit above. Count your deterministic branches vs. judgment branches. If the judgment branches are under 10%, you probably don't need an agent yet — you need a better classifier. If they're over 30%, that's a legitimate agent migration candidate. Share what you find.

FAQ

Q: What's the biggest mistake teams make when migrating from RPA to AI agents?

Treating the migration as a 1:1 swap instead of a redesign. RPA encodes rules; agents reason. If you just re-implement your rules as agent prompts, you get an expensive, slower RPA. The migration requires identifying which steps genuinely benefit from reasoning and rebuilding only those.

Q: How do you prevent runaway token costs in an agent workflow?

Set hard limits at the infrastructure level: maximum tool calls per task, maximum turns before escalation, and structured state checkpointing instead of full conversation history. Token costs in multi-agent systems are multiplicative — a 2x increase in agent count can produce a 5–10x cost increase without these guardrails.

Q: When should a workflow stay as a rules engine instead of becoming an agent?

When over 90% of cases are deterministic. If your workflow consistently produces the same output for the same input, an LLM adds latency and cost without improving outcomes. Reserve agents for tasks with genuine contextual ambiguity — approval decisions with policy exceptions, document routing that depends on content intent, or triage that requires weighing competing signals.

Q: What does "shadow mode" mean for agent deployments and why does it matter?

Shadow mode runs the new agent in parallel with the existing workflow without affecting live outputs. The agent processes the same inputs and logs its decisions, but the old system still controls actual outcomes. This generates a real-world test suite before cutover and exposes failure modes — particularly tool hallucination and edge-case misclassification — that synthetic test data misses.

Q: How long should a shadow run last before going live?

Two weeks is the practical minimum for high-volume workflows. You want to capture the full range of input variation, including end-of-period batches, exception spikes, and any weekly patterns in your data. Shadow runs shorter than a week frequently miss the edge cases that cause production incidents.

Q: What's the right human-in-the-loop model for migrated workflows?

Humans set thresholds and review exceptions — they don't approve every transaction. Define explicit escalation triggers (low confidence, rule conflict, high-value threshold) and build the escalation path before go-live. Human-in-the-loop that requires approval on every decision defeats the purpose of automation entirely.

Why Your Rule-Based Workflow Migration Is Harder Than It Looks

Key Insight

Why Teams Miss This

How to Actually Do It

What We've Learned

FAQ

Sources