AI IN PRODUCTION

Stop Building Agents. Build Pipelines with AI Steps.

Published May 23, 2026 — 4 min read

TL;DR: Enterprise teams are failing in production because they're treating every AI workflow as an agent problem. Most of what you're trying to do doesn't need autonomous reasoning — it needs a reliable pipeline with AI at the right steps. The distinction is costing teams months of debugging and real money.

Key Insight

"Agent" has become the default architecture for any AI workflow, and it's wrong for most use cases.

The definition matters: a true agent perceives state, reasons about what to do next, and takes actions in sequences that can't be fully predicted in advance. A pipeline is a fixed sequence of steps, some of which happen to call an LLM.

The industry has collapsed these two things into one bucket and called everything an "agent." The result: 88% of AI agent initiatives fail to reach production at scale. Multi-agent architectures introduce orchestration complexity that grows nearly exponentially — and cascading failures that are genuinely hard to reproduce in staging because they depend on runtime state you can't replicate.

Most enterprise AI use cases — document processing, support routing, data extraction, report generation — don't need agents. They need deterministic pipelines where AI handles the ambiguous parts.

Why Teams Miss This

The failure is conceptual before it's technical.

"Agent" sounds sophisticated. "Pipeline" sounds like IT infrastructure from 2012. Teams pitch to stakeholders, and "we built an autonomous AI agent" gets funded. "We built a processing pipeline with LLM steps" doesn't make the all-hands slide.

So teams architect for the pitch, not for production. They add dynamic tool selection where a fixed function call would work. They introduce multi-agent delegation where a single LLM call with more context would suffice. Every layer of autonomy adds a new surface for failure.

The second miss: agents are hard to test. A deterministic pipeline can be unit-tested at each step. An agent that dynamically decides its path through a workflow can behave correctly in 95% of cases and catastrophically in the remaining 5% — and you won't catch that 5% in staging because it only appears in rare input combinations at production scale.

The third miss is the one that doesn't show up in postmortems: the most dangerous production failure mode isn't an obvious crash. It's confident, well-formatted, operationally wrong output that humans approve because it looks right. That failure mode is worse in agentic systems because the chain of reasoning that produced it is harder to audit.

How to Actually Do It

The decision heuristic: Ask "Can I write pseudocode for this workflow with fixed steps?" If yes — build a pipeline. Reserve true agents for workflows where the branching logic is genuinely unknowable in advance.

Pipeline-first architecture:

result = agent.run("Process this customer complaint and take appropriate action")

def process_complaint(complaint: str) -> dict:

category = classify(complaint) # LLM step 1: fixed output schema

sentiment = analyze_sentiment(complaint) # LLM step 2: bounded classification

if category == "refund" and sentiment == "high_anger":

return escalate_to_human(complaint) # deterministic routing, no LLM

response = draft_response(complaint, category) # LLM step 3: generation

return queue_for_review(response) # deterministic action

Each LLM step has a defined input schema, a defined output schema, and a predictable failure mode. The pipeline is observable, testable, and debuggable — because it's just a pipeline that happens to call an LLM.

When agents are actually the right call: Tasks where the path is unknowable until the agent encounters it. Code debugging — the agent can't know in advance what errors it'll hit. Deep research requiring iterative exploration. Any task with genuinely unbounded tool use where the human equivalent would also be improvising.

Migration pattern for existing "agents":

  1. Add structured logging to every tool call and decision point in your current agent
  2. After a week, visualize the 20 most common traces
  3. For each trace, ask: "Was this path predictable in advance from the initial input?"
  4. Any path traversed by >50% of runs is pipeline territory — extract it
  5. Keep the agent only for the tail cases that genuinely require dynamic routing

The staffing analogy: Most teams would never hire a senior consultant to do data entry just because they could handle ambiguous edge cases. Same logic applies to agents. Use the expensive, autonomous capability for genuinely ambiguous work. Route everything else through cheaper, predictable infrastructure.

What We've Learned

The best AI deployments in production right now aren't the ones with the most autonomous agents — they're the ones that are ruthlessly boring underneath. A three-step LLM pipeline running 10,000 times a day without incident beats a sophisticated agent that handles edge cases brilliantly but fails on routine inputs 2% of the time at scale.

That 2% at scale is 200 failures a day. That's a support queue, not an edge case.

Build the pipeline. Extract agent logic only where you genuinely can't predict the path. Measure production success rate, not architectural elegance.

Next experiment worth running: if you have an existing agent in production, add step-level logging this week. After seven days, count how many distinct paths it actually took. If the top three paths cover 70%+ of volume, you've got a pipeline masquerading as an agent — and a straightforward refactor that'll cut your incident rate significantly.


Sources