Framework Friday

Stop Writing Better Prompts. Start Engineering Better Context.

Published May 08, 2026 — 5 min read

TL;DR: Teams spending months tuning prompts while their agents fail in production are solving the wrong problem — 82% of AI leaders now say prompt engineering alone can't power agents at scale, and the teams winning in production have moved to a fundamentally different discipline: context engineering.


Key Insight

Prompt engineering is about how you talk to the model. Context engineering is about what the model knows when it answers you.

That distinction sounds subtle. It isn't. It's the difference between coaching a doctor on bedside manner versus giving them the patient's actual chart.

Here's the contrarian take most teams don't want to hear: if your agents are underperforming, the problem is almost certainly not your prompts. It's the information pipeline that feeds the model. You can wordsmith system instructions for weeks and gain nothing, while a team that engineers their retrieval pipeline, manages token budget deliberately, and structures what goes into context at what time will run circles around you with a generic "be helpful" prompt.

Context engineering emerged in mid-2025 as the successor to prompt engineering. The 2026 State of Context Management Report found 82% of IT and data leaders have already concluded that prompting alone won't cut it. Production teams aren't tuning prompts anymore — they're building context pipelines.


Why Teams Miss This

The prompt-first instinct is understandable. It's tactile — you can see the prompt, change a word, observe a result. The feedback loop is tight.

Context problems are harder to see. The model looks like it's running. Tokens flow. Outputs generate. But the model is reasoning from the wrong information — stale retrieval, truncated history, bloated context that's diluting the signal it actually needs.

Three failure modes that look like prompt problems but aren't:

1. Context rot. Research across 18 frontier models in 2026 found accuracy degradation of 30%+ for information in mid-window positions — not at the edges, in the middle. If your most important context is buried in a 50k-token window, the model may not be weighting it the way you think. The fix isn't a better prompt; it's context placement strategy.

2. Tool overload. Give an agent 30 tools "just in case" and performance drops. Anthropic's production guidance is explicit: if a human engineer can't immediately say which tool to use in a given situation, the agent can't either. Trim the tool set to what's needed for the current task scope. A bloated tool list reads like a context problem that looks like a reasoning problem.

3. Pre-stuffed context. Teams dump entire knowledge bases or documents into context upfront. This burns tokens on information the agent may never need and pushes relevant content into mid-window dead zones. The winning pattern is just-in-time retrieval — maintain lightweight identifiers, pull data via tools when the task actually requires it.


How to Actually Do It

Context engineering has four concrete levers:

1. Treat context like a token budget, not a dump site

Know your model's effective context window (not the advertised limit — in practice, most models degrade well before the ceiling). Allocate that budget intentionally:

CONTEXT_BUDGET = 100_000 # effective limit, not advertised max

budget_allocation = {

"system_instructions": 0.08, # 8k tokens — keep it lean

"tool_definitions": 0.05, # 5k — only tools this task needs

"working_memory": 0.15, # 15k — current task state

"retrieved_context": 0.55, # 55k — the actual content

"conversation_history": 0.12, # 12k — compressed, not raw

"output_buffer": 0.05, # 5k — room to respond

}

2. Compress conversation history, don't truncate it

Most frameworks truncate history when context fills — dropping the oldest messages. That loses architectural decisions and prior reasoning. Instead, compact: summarize the conversation so far, preserving unresolved issues and key decisions. Sub-agents can receive a condensed brief instead of full history.

3. Sub-agent isolation for clean context windows

For multi-step workflows, each sub-agent should have its own clean context scoped to its task. The orchestrating agent doesn't need to carry the full retrieval payload for step 6 when it's executing step 2. Sub-agents return condensed summaries to the parent — results, not raw working data.

4. Audit tool sets per task, not globally

Stop defining one master tool list and loading it for every agent run. Define task-scoped tool sets. A customer support agent routing a refund request doesn't need your analytics query tool. Narrower tool sets mean the model makes faster, more accurate tool-selection decisions and the tool definitions consume less of your context budget.


What We've Learned

The teams consistently shipping reliable agents in production have a shared characteristic: they treat the model's information environment as a first-class engineering problem, not an afterthought.

Immediate experiment: Pick one underperforming agent and do a context audit before touching the prompt. Print the full context at inference time. Ask: What's in here that shouldn't be? What's missing that should be? Where is the most important information relative to the context window? What tools are defined that this task never actually uses?

In most cases, that audit will surface more actionable improvements than another round of prompt iteration.

The model is not your problem. What you're feeding it is.


Sources