Case Study Thursday · Agent Ops

What Actually Broke When We Deployed Our First AI Agent (And How We Fixed It)

Three failure modes we hit in production, four guardrails that fixed them, and a pre-ship checklist for teams about to deploy their first agent into real ops.

Published February 26, 2026 — 9 min read

The demo worked perfectly.

Of course it did. Demos always work. You control the inputs, you know the golden-path outputs, and nobody is feeding the agent anything weird. We ran the demo for stakeholders three times. It was smooth every time.

Two weeks into production, the agent had confidently written to a field it wasn't supposed to touch, hallucinated a vendor contact that didn't exist, and — in one memorable afternoon — gotten itself stuck in a loop that burned through more API tokens in four hours than we'd used in the entire testing phase.

This post is the debrief. What broke, why it broke, what we built to stop it from happening again, and the minimum viable guardrail layer your team should have in place before you ship.

Note: the details below are drawn from a composite of real client engagements. Names and specifics are generalized.

The setup: an AI agent for content ops

The client was a mid-size B2B SaaS company with a lean marketing ops team. They had a growing content library — product pages, case studies, help articles — that needed periodic enrichment: updating outdated stats, tagging content by persona and funnel stage, flagging gaps, and writing summary briefs for the sales team.

Doing this manually meant one person spending about 20 hours a week on a task that was mostly mechanical. A well-scoped AI agent felt like the right call.

The agent's job, in plain English:

Pull a content record from the CMS queue
Analyze it against a set of criteria (accuracy, completeness, persona alignment)
Enrich it — update the metadata fields, write a brief, flag anything that needed human review
Write the result back to the CMS and move to the next record

Straightforward. Tool calls to a CMS API. Structured outputs. A prompt with clear criteria. We tested it against 50 records, reviewed the outputs manually, and felt good about the accuracy.

Then we turned it loose on the real queue.

Failure Mode #1: It edited things it wasn't supposed to touch

Failure 01 / Scope Creep

The agent rewrote fields it had read access to but shouldn't have written

Our CMS API didn't have granular field-level write permissions at the time. The agent had a write token scoped to the content record — meaning it could technically update any field on that record, including the canonical URL slug, the original author attribution, and the publication date.

The prompt said "update metadata fields." The agent interpreted "metadata" broadly. It helpfully "corrected" a publication date it thought was wrong. It overwrote an author field because the original author's name didn't match a pattern it had been trained to associate with the company. Three content records got their slugs silently changed, breaking inbound links.

This is the blast radius problem. The agent wasn't doing anything malicious — it was doing exactly what it thought it was supposed to do. The failure was ours: we gave it write access to an entire record when it only needed write access to specific fields.

The NIST AC-6 principle — least privilege — applies directly here. An agent should have the minimum permissions needed to complete its task, not the maximum permissions its API token allows.

Fix 01

Field-level allowlists, not record-level write tokens

We rebuilt the write tool to accept an explicit allowlist of writable fields — the agent could only call update_field(field_name, value) for fields on the approved list. Any attempt to write to a non-listed field threw a hard error, logged the attempt, and flagged it for human review.

We also added a dry-run mode: before any write, the agent produces a structured diff of what it intends to change. A lightweight validator checks that diff against the allowlist before execution. If anything looks off, it stops and queues for human review instead of proceeding.

Failure Mode #2: Hallucinated data got written as fact

Failure 02 / Hallucination at Write Time

The agent invented a vendor contact and wrote it to the CRM

One task involved enriching a case study with the client contact's name and title. The record had a company name but no contact info. The agent — instructed to "complete the record where possible" — searched its context, found nothing, and then fabricated a plausible-sounding person: a "Director of Marketing" with a name that sounded right for the company.

That invented contact got written to the record. Two weeks later, a sales rep tried to reach out to a person who didn't exist.

This one is sneaky because it only happens on edge cases — records where the expected data isn't present. In testing, we'd used records that all had complete contact info. The "no data" case never surfaced.

Fix 02

Structured outputs with explicit uncertainty handling

We updated the output schema to include a confidence field and a source field for every populated value. The agent was re-prompted: if you don't have a reliable source for a value, return null and set confidence: "low". Do not guess.

Any field with confidence: "low" gets written as blank, not with the hallucinated value
A separate needs_human_review flag gets set to true on the record
The record goes into a human review queue instead of the completed queue

The agent is still useful on these records — it can do the analysis, write the brief, update the tags — it just doesn't make things up to fill gaps it can't actually fill.

Failure Mode #3: A retry loop that couldn't stop itself

Failure 03 / Runaway Retry Loop

A transient API error triggered 4 hours of retries and $340 in unexpected API costs

The CMS API had a brief outage — about 40 minutes. During that window, the agent's write calls were returning 503s. The retry logic we'd built was simple: if a write fails, wait 5 seconds and try again. Reasonable for a momentary blip.

What we hadn't accounted for: the agent was also re-reading the record before each retry to "refresh its context." Each re-read was a separate API call. Each retry involved re-running the analysis step. The LLM calls added up fast. Over four hours — until someone noticed and manually killed the process — it had retried the same three records hundreds of times, burning through tokens at a rate we hadn't budgeted for.

Fix 03

Circuit breakers, max retry caps, and cost budget enforcement

Hard retry cap: Maximum 3 retries per record, with exponential backoff (5s, 30s, 2min). After 3 failures, the record gets flagged as "blocked" and the agent moves on.
Circuit breaker: If more than 3 consecutive records fail on the same step, the agent pauses, sends an alert, and waits for human acknowledgment before resuming.
Per-run cost budget: We set a token budget per run. When the budget is 80% consumed, the agent logs a warning. At 100%, it hard-stops and queues remaining records for the next run.
Observability: We added OpenTelemetry traces to every agent step. Every tool call, every LLM invocation, every retry is logged with timestamps and token counts. The runaway loop would have been visible in the dashboard within minutes.

The four guardrails that actually stuck

After the smoke cleared, we built these four layers into every agent we deploy now — regardless of the use case.

Guardrail 01

Least-privilege tool design

Every tool the agent can call is scoped to the minimum action needed. Read tools are separate from write tools. Write tools have field-level allowlists, not record-level access. Destructive actions (delete, overwrite) require a separate confirmation step — a form of interrupt pattern applied at the tool boundary. The agent's permission envelope is the first thing we design — not an afterthought.

Guardrail 02

Structured outputs with explicit confidence and uncertainty

Every output schema includes confidence levels and source fields. The agent never writes a null-handling guess as a real value. Low-confidence outputs get routed to a human review queue. This alone has prevented more embarrassing mistakes than any other single change.

Guardrail 03

Circuit breakers and cost caps

Retry logic is not enough. You need a maximum retry count, a circuit breaker that pauses on consecutive failures, and a hard per-run cost budget. If your agent can run indefinitely on a bad input, it will — eventually. Set the boundaries before you deploy.

Guardrail 04

A second model for output validation

We added a lightweight validation step after each write: a separate (cheaper, smaller) model reviews the agent's output against a rubric before it's committed. This is a pattern that teams with production experience keep recommending — as the Digits ML team put it: "Use a different LLM to evaluate responses. Never trust a single model to police itself." It catches schema drift, hallucinations, and out-of-scope writes before they become problems.

What this cost to build (and what it saved)

~3 days

to retrofit guardrails after the incidents

18 hrs/wk

saved in ongoing manual content ops

production incidents in 6 weeks post-fix

The agent is genuinely useful now. It processes 200–300 content records per week, flags about 15% for human review, and the team's manual time on content ops is down from ~20 hours to about 2 hours — the human review queue plus occasional spot checks.

But here's the honest accounting: if we'd built the guardrails before deployment instead of after, we'd have saved ourselves three incidents, one panicked stakeholder call, and $340 in wasted API spend. The retrofit cost more — in time and trust — than the original build would have.

The pre-ship checklist

Before you deploy an agent into any production workflow, run through this list:

Agent Production Readiness Checklist

Every write tool has an explicit field-level allowlist — the agent cannot write to anything not on the list
Destructive actions (delete, overwrite existing data) require a separate human-approval or confirmation step
Output schema includes a confidence field and a source field for every AI-populated value
Low-confidence outputs are routed to a review queue, not written as fact
Retry logic has a hard maximum (no more than 3–5 retries per task)
Circuit breaker pauses the agent after N consecutive failures and sends an alert
Per-run cost budget is set — agent hard-stops when the budget is hit
Every tool call and LLM invocation is logged with timestamps and token counts (OpenTelemetry or equivalent)
A second model or rule-based validator reviews outputs before they're committed
The agent has been tested on edge cases — empty inputs, malformed data, API timeouts
An incident response path exists: who gets alerted, how to pause the agent, how to roll back a bad write

This isn't a comprehensive security posture — for that, the OWASP LLM Top 10 is worth a full read, particularly around prompt injection (the #1 risk) and insecure output handling. But the checklist above catches the failure modes that are most likely to bite a team on their first real deployment. For a broader framework covering all four runtime guardrail layers, see our post on agent guardrails in production.

The uncomfortable truth about AI agents: the demo environment is not the production environment. The demo uses clean inputs, known edge cases, and a sympathetic reviewer. Production has weird data, users who phrase things oddly, APIs that go down at 2pm on a Tuesday, and a CMS that doesn't quite behave the way the docs say it does. Your guardrails need to be designed for that world, not the demo.

One more thing: prompt injection is real

We didn't run into this in the content ops case, but it's worth flagging: if your agent reads external content and then acts on it, prompt injection is a live threat.

Indirect prompt injection — where malicious instructions are embedded in content the agent reads — has moved from theoretical to documented in the wild. Real incidents include the Perplexity Comet data leak and zero-click RCE exploits in MCP-connected IDEs. If your agent reads emails, web pages, CMS records, or any other user-controlled content before taking actions, you need to treat that content as untrusted input.

Practical mitigation: separate the reading step from the action step. Have the agent summarize or extract structured data from external content first (in a sandboxed step with no tool access), then pass only that structured data to the action step. Don't let the agent read a web page and immediately act on whatever it finds there.

The bottom line

AI agents are genuinely useful for operations work. The content ops agent we deployed is saving 18 hours a week for a team that was drowning in manual tasks — and it's doing it reliably now.

But the "reliably" part didn't come for free. It came from three incidents, a few days of retrofit work, and a set of guardrails that should have been there from day one.

If you're about to ship your first agent into a production workflow: run the checklist, build the guardrails before you deploy, and treat the demo environment as what it is — a controlled simulation, not a proof of production-readiness.

Your future self will thank you.

Sources:

Deploying an AI agent into your ops stack and want to gut-check your guardrails before you go live? That's exactly what we do at Supergood Solutions — reach out and let's talk through your setup before production finds the edge cases for you.