AI Agent Ops · Infrastructure

The AI Agent Control Plane: Why Governing Agents at Scale Is the Next Infrastructure Problem

Most AI agents in production today have no centralized governance — safety rules are hard-coded per agent, making oversight brittle and slow. A new category of infrastructure is emerging: the agent control plane. If you're running agents in production, this is the layer you're probably missing.

Published March 12, 2026 — 11 min read

The Problem: Agents Ship Without a Safety Net

MIT CSAIL's 2025 AI Agent Index dropped a sobering finding: of 30 major AI agents studied — spanning ChatGPT Agent, Claude Code, Perplexity Comet, Microsoft 365 Copilot, and more — only half have published safety or trust frameworks. One in three has zero safety documentation. Five out of 30 claim no compliance standards at all.

Meanwhile, 13 of those 30 systems operate at "frontier agency" — meaning they can execute extended task sequences with minimal human oversight. Browser agents in particular run with high autonomy, navigating sites, logging in on behalf of users, and making decisions across multi-step workflows.

The kicker: 21 out of 30 agents provide no disclosure to websites or third parties that they're bots. Some actively disguise themselves with Chrome-like user-agent strings and residential IPs to bypass anti-bot protections.

This is the governance gap. The models are good enough. The frameworks are mature enough. The missing piece is operational control — the ability to enforce behavior policies across every agent, in real time, without taking systems offline.

What Is an Agent Control Plane?

Think of it like a network control plane, but for AI agent behavior. Instead of routing packets, you're routing decisions — defining what agents can and can't do, how they identify themselves, which tools they can call, and what triggers human escalation.

A control plane sits between your agents and their operating environment, providing:

Centralized policy management — Write a rule once ("never expose PII," "always require human approval for transactions over $10K"), enforce it everywhere.
Runtime guardrails — Block or redirect agent behavior in real time without redeployment. If an agent starts hallucinating or calling tools it shouldn't, the control plane intervenes before the action completes.
Observability hooks — Trace not just what an agent did, but why it made each decision. Traditional logs capture code execution; agent traces capture reasoning chains.
Policy portability — As your agent stack evolves (new frameworks, new models, new vendors), your governance policies move with you.

This isn't theoretical. It's shipping now.

Galileo's Agent Control: Open Source, Vendor-Neutral

On March 11, Galileo released Agent Control as an open-source project under Apache 2.0. The first integrations include Strands Agents, CrewAI, Glean, and Cisco AI Defense.

What makes it notable:

Write once, deploy anywhere — Policies are portable across agent frameworks. You're not locked into one vendor's guardrail format.
Hot-swap enforcement — Update policies at runtime without pulling agents offline. This matters when a production agent starts exhibiting new, unexpected behavior and you need to respond in minutes, not sprint cycles.
Custom evaluators — Bring your own guardrail evaluators or use third-party ones. The control plane is evaluator-agnostic.
Use cases that actually matter — Preventing hallucinations, blocking PII leakage, steering LLM selection for cost optimization, enforcing brand tone, requiring human approval on sensitive actions, and falling back to alternative tools on error.

Dev Rishi, GM of AI at Rubrik: "The number one blocker for enterprise agents is no longer the models. To graduate agents to production, the industry needs transparent, community-driven guardrails."

The Broader Observability Stack Is Growing Up

Agent Control is one piece of a maturing ecosystem. Here's what else is converging:

Arize Phoenix

Arize Phoenix has become a go-to for agent tracing, with native support for MCP (Model Context Protocol) tracing across client-server hierarchies. It integrates with OpenAI Agents SDK, CrewAI, PydanticAI, LangGraph, and Google ADK. Their thesis: "You cannot fix AI failures with standard logs because the error lives in the reasoning, not the code execution."

Amazon Bedrock AgentCore Evaluations

Amazon published a detailed evaluation framework based on lessons from thousands of agents built across Amazon organizations since 2025. The key shift: evaluating not just model output quality, but tool selection accuracy, multi-step reasoning coherence, and memory retrieval efficiency. They've baked this into Bedrock AgentCore as a reusable evaluation library.

Langfuse and Braintrust

Langfuse continues to grow as an open-source LLM observability platform, while Braintrust takes an evaluation-first approach — treating prompts as versioned objects and merging testing directly with production monitoring via their purpose-built OLAP database (Brainstore).

OpenTelemetry for AI

The OpenTelemetry project is emerging as the vendor-neutral instrumentation layer for agent telemetry. New Relic recently launched dedicated AI agent observability tooling built on OTel, and Datadog has added GPU monitoring and autonomous SRE agents to their stack.

Lessons from ConFoo 2026: Guardrails Where the Wheels Touch the Road

At ConFoo 2026 in Montreal, a recurring theme emerged across sessions: the shift from human access models to agentic access models.

Nick Taylor from Pomerium made the case that Zero Trust principles now apply to AI agents, not just human users. His practical recommendation: put MCP servers behind an identity-aware proxy, enforce per-request authentication, validate token scopes, prevent token passthrough, and audit every access. The metaphor — a venue wristband vs. a one-time gate check — captures the difference between static authentication and continuous enforcement.

GitGuardian's Ben Dechrai extended this to prompt security, arguing that prompt hygiene is the new input validation. When agents can call tools, a prompt injection isn't just a bad output — it's an unauthorized action.

A Practical Checklist for Agent Governance

Before your agents touch production:

Instrument first, ship second — Wire in observability (traces, not just logs) before your agent touches production traffic. Tools like Arize Phoenix, Langfuse, or Datadog's AI monitoring make this straightforward. For a practical breakdown of what an AgentOps observability stack looks like, see our earlier post.
Centralize your policies — Stop hard-coding safety rules inside individual agent codebases. Use a control plane to manage policies centrally. This is the only way to enforce consistency at scale.
Enforce at runtime, not just deploy time — Static rules break when agents encounter novel inputs. You need runtime guardrails that can block, redirect, or escalate in real time.
Treat agent identity seriously — Your agents should identify themselves. Use stable user-agent strings, publish IP ranges, respect robots.txt. Don't be the team that disguises bots as humans.
Evaluate continuously — Amazon's framework makes the case for ongoing evaluation across tool selection, reasoning coherence, and task completion — not just a one-time benchmark.
Require human checkpoints for high-stakes actions — Financial transactions, data deletions, external communications — build human approval into your control plane as policy, not ad-hoc code.

What's Next

The agent control plane is following the same trajectory as API gateways did a decade ago. First, every team builds their own. Then open standards emerge. Then it becomes infrastructure everyone expects to exist.

We're in the "open standards emerging" phase right now. Galileo's Agent Control, OpenTelemetry's AI instrumentation work, and Amazon's evaluation library are all pulling in the same direction: making agent governance a composable, vendor-neutral infrastructure layer.

If you're betting on agents as a core part of your product or operations, start treating governance as infrastructure — not an afterthought. Our agent ops runbook provides a concrete checklist for teams standing up this infrastructure for the first time.

FAQ

What is an AI agent control plane, and why does it matter?

An AI agent control plane is a centralized infrastructure layer that lets you define and enforce behavior policies across all your AI agents. It matters because without one, safety rules get hard-coded into individual agents, making governance slow, inconsistent, and fragile as you scale. Think of it like a network control plane, but for agent decisions instead of packet routing.

How is agent observability different from traditional application monitoring?

Traditional monitoring tracks code execution — server health, latency, error rates. Agent observability tracks reasoning — why an agent chose a specific tool, how it chained multiple steps together, and whether its decision was correct even if it didn't throw an error. Tools like Arize Phoenix and Langfuse capture these reasoning traces, which is essential because agents often fail in ways that look like success.

What are the biggest risks of running AI agents without guardrails?

The MIT CSAIL 2025 AI Agent Index found that most agents have no published safety frameworks, actively disguise themselves as human traffic, and operate with minimal oversight on extended task sequences. Practical risks include PII leakage, unauthorized actions via prompt injection, hallucinated outputs that look confident, and uncontrolled cost escalation from excessive tool calls or token usage.

Is Galileo Agent Control the only open-source option?

No, but it's the first purpose-built open-source control plane focused specifically on centralized agent policy management. Other open-source tools address pieces of the governance puzzle: Langfuse for observability, Arize Phoenix for tracing and evaluation, and OpenTelemetry for vendor-neutral instrumentation. Agent Control is designed to sit on top of these, providing a policy layer that integrates with any framework.

How do I get started with agent governance if I only have a few agents?

Start small: (1) add observability instrumentation to your existing agents using Langfuse or Arize Phoenix, (2) document your implicit safety rules as explicit policies, (3) deploy those policies through a control plane as you scale beyond 2-3 agents. Even with one agent, having traceable reasoning and runtime guardrails prevents the most common production failures.

What role does OpenTelemetry play in AI agent monitoring?

OpenTelemetry provides a vendor-neutral framework for collecting traces, metrics, and logs from AI systems. It's becoming the standard instrumentation layer so that agent telemetry data is portable across observability platforms — you instrument once and can send data to Datadog, New Relic, Arize, or any OTel-compatible backend.

Sources:

Building AI agents into your ops stack and need governance that scales? We help teams design agent control planes with the guardrails, observability, and security posture to run agents in production. supergood.solutions