Future Friday

Event-Driven Agent Architectures: How to Scale Agentic Workflows Without Polling

Published April 18, 2026 — 4 min read

TL;DR: As agent deployments grow, request-response patterns become bottlenecks. Event-driven architectures decouple agents from synchronous calls, letting them listen for events and respond asynchronously. This post covers the three core patterns—fire-and-forget, event streams, and choreography—with concrete tradeoffs and when to use each.

---

The Polling Problem

Most early-stage agent systems work like this: app calls agent, waits for response, moves on. It's simple, it's blocking, and it breaks at scale.

When you have multiple agents, conditional workflows, or long-running tasks, synchronous request-response becomes expensive. You're burning tokens waiting, you're holding database connections open, and you can't easily trigger parallel workflows. You're essentially polling for completion—a pattern we learned to hate in the 1990s.

Event-driven flips this: agents don't wait for calls. They subscribe to events relevant to their scope, process them independently, and emit new events when done. No blocking. No wasted compute. More parallelism.

Pattern 1: Fire-and-Forget with Message Queues

The simplest event-driven pattern. An event lands in a queue (SQS, Kafka, RabbitMQ, or a cloud pubsub service). An agent consumer picks it up, processes it, optionally emits a completion event.

How it works:

  1. Your app publishes an event: `{"type": "lead_enrichment_requested", "lead_id": 123}`
  2. The lead enrichment agent consumes it, does its work, publishes: `{"type": "lead_enrichment_completed", "lead_id": 123, "data": {...}}`
  3. A separate consumer (maybe a webhook handler, maybe another agent) subscribes to the completion event.

Best for: Independent tasks that don't need real-time response. Data pipeline jobs. Batch-like workflows. Anything where eventual consistency is acceptable.

Tradeoff: You lose the ability to return a response immediately. Good for background work; bad for user-facing requests.

Pattern 2: Event Streams with State

This is fire-and-forget plus coordination. Each agent publishes events to an immutable log (Kafka, cloud event sourcing). Other agents listen, build state, trigger downstream actions.

A marketing automation example:

  1. CRM publishes: `contact_created`
  2. Lead scoring agent listens, enriches, publishes: `lead_scored`
  3. Segmentation agent listens to `lead_scored`, publishes: `contact_segmented`
  4. Campaign agent listens to `contact_segmented`, publishes: `campaign_started`

Each agent is loosely coupled. If the lead scoring agent fails or is slow, it doesn't block the pipeline—events queue up. The immutable log lets you replay or audit exactly what happened.

Best for: Multi-step workflows with clear phases. Audit trails. Workflows where replay or reconstruction matters.

Tradeoff: More infrastructure. Harder to debug than synchronous code. Latency can compound if you have many sequential stages.

Pattern 3: Choreography with Context

The most flexible pattern: agents don't have a central orchestrator. Instead, each agent knows what events it cares about and what to do when they arrive. Events carry context (request ID, prior results), so agents can make decisions.

Example: `order_placed` event arrives. The payment agent listens, charges the card, publishes `payment_processed`. The fulfillment agent listens to `payment_processed`, picks the order, publishes `order_ready_to_ship`. The notification agent listens to `order_ready_to_ship` and emails the customer. No controller. Each agent acts independently.

Best for: Complex, non-linear workflows. High-scale systems where you can't afford a central bottleneck. Systems that need to change quickly (add a new agent, update logic).

Tradeoff: Harder to reason about the overall flow. Deadlocks possible if you have circular dependencies. Requires good observability.

Practical Setup: What to Actually Use

For most teams: Start with SQS + Lambda (AWS) or Cloud Tasks + Cloud Functions (GCP). Fire-and-forget with minimal ops overhead. You get built-in retry, DLQ, and visibility.

For more complex workflows: Temporal (open source workflow engine) or Inngest (serverless orchestration). Both handle retries, timeouts, and state. You define agents as discrete steps and the orchestrator wires them together.

For truly event-driven: Kafka. Overkill for most use cases, but necessary if you need true event streaming, audit trails, and replay. Requires ops investment.

Watch Out For

Exactly-once delivery: Queues don't guarantee it. Use idempotency keys. Every event should include a unique ID that lets your agent safely process it twice.

Event ordering: If order matters, don't use fire-and-forget with multiple consumers. Use an ordered queue (SQS FIFO, Kafka partition) or Temporal/Inngest.

Observable failures: Agents fail silently by default. Wire DLQs, alerting, and structured logging. You need to know when an event sits unprocessed for 10 minutes.

Token bleed: Each agent call uses tokens. In a long event chain, tokens can explode. Add circuit breakers or token budgets per event type.

FAQ

Should I use events for every agent call?

No. If the work is synchronous and must complete before the caller proceeds, keep it request-response. Events are for work that can happen in the background or in parallel. Start synchronous, move to events only when you hit scale or latency problems.

What if I need to wait for the result of an agent's work?

Use a correlation ID. The triggering app publishes an event with a unique request ID, then polls or subscribes to a results topic with that ID. Or use a workflow engine like Temporal that handles the waiting for you.

How do I debug event-driven workflows?

Structured logging + request IDs on every event. Track which events touched which agents. Use a tracing tool (Datadog, New Relic, open source Jaeger) to follow a single request through the system.

Can I mix synchronous and asynchronous agents?

Yes. A synchronous agent can publish events. An async agent can make synchronous calls. The pattern is about your architecture, not about individual agents.

---

Sources

---

Next Step

Pick one: run a POC with SQS + your agents this week, or evaluate Temporal if your workflows have complex retry/timeout logic. Measure time-to-completion and token usage before and after. You'll see if events actually help your specific setup.