← home

Remediation Asymmetry: When Agents Can Diagnose More Than They Can Fix

June 26, 2026

Remediation Asymmetry — the structural gap between what an agent can diagnose and what it can remediate
The better your agent's diagnostic capability, the more money it wastes.

The better your agent's diagnostic capability, the more money it wastes.


The Pattern

The agent identified the root cause in 14 seconds. It spent the next 9 minutes and 46 seconds proving it could not fix it.

We noticed this across five production incidents in the same week, spanning two different agents, three execution runs, and four different error domains. Each time, the agent's diagnostic output was technically excellent — correct root cause, correct recommendation, correct severity assessment. Each time, the recommendation was an action the agent could not perform. And each time, the agent continued investigating long after the diagnosis was complete.

The behavior looked like over-investigation. We named it that initially. But when we examined the structure, something more specific emerged: the agent's diagnostic capability vastly exceeded its remediation authority. It could often diagnose problems with far more depth than it had authority to resolve.

The agent had already solved the diagnostic problem. What it lacked was the authority to resolve the operational one.

That is the structural mismatch. We call it Remediation Asymmetry: the gap between what an agent can diagnose and what an agent can remediate.

When this gap exists, agents don't stop. They investigate.


The Incident That Named It

We run AI agents as batch jobs in Cloud Run containers. Eight prompts per batch, four services (Notion, Slack, Gmail, GDrive). Each prompt should take 30 seconds.

Prompt 3 hit a Google 401: expired OAuth token. The agent's response:

ElapsedActionValue Added
0:00–0:14Read error message, identified "token expired"Full diagnostic value
0:14–1:30Tested GDrive, GCalendar, Gmail separatelyConfirmed token-wide (already implied by error)
1:30–4:00Wrote 3 Python scripts to test alternative auth pathsNone — all paths require the same token
4:00–7:00Inspected SDK source code, read session logsNone — root cause was already identified
7:00–9:46Produced detailed root cause analysis documentNone beyond the 14-second diagnosis

The agent's final recommendation: "The OAuth token needs to be re-authorized through a browser-based flow."

That recommendation was available from the first error message. Everything after second 14 was diagnostic effort applied to a problem the agent had already solved — and could never remediate.

But this incident alone only told part of the story. The pattern ran deeper.


The Twist: It Investigated Working Tools

We fixed everything. Expired Google OAuth tokens → deployed a 30-minute auto-refresh cron. Missing Slack scopes → reconfigured. No timeout → enforced 120-second limit. Multi-LLM fallback → re-integrated.

We ran the same batch again. Every external service was working. Gmail returned real emails. GDrive returned real files. Slack listed real channels.

Prompt 3 — Gmail — returned successfully. The tool worked. The data came back. And the agent still spent 67 seconds reading internal source files before reporting results:

agent$ Reading entrypoint_sdk.py to understand how MCP is configured...
agent$ Reading mcp-bridge.mjs to verify bridge setup...
agent$ Reading call_mcp.py to check tool routing logic...
agent$ Tool call succeeded. Reporting results.
The agent investigated its own working infrastructure before acknowledging success.

This is the cruelest form of remediation asymmetry: even when there is nothing to diagnose, the diagnostic capability activates. The investigation isn't triggered by errors. It's triggered by the existence of diagnostic tools. (This is the Behavioral Induction principle from Post 17 — capabilities shape behavior. Remediation Asymmetry shows what happens when that shaping has no exit condition.)


Fresh Evidence: The Exa Incident (Different Agent, Different Domain)

To prove remediation asymmetry is structural and not incident-specific, we ran a completely different agent.

The Debugging Agent (different delegations, different service account, June 2026) hit a failure in an entirely different domain: the Exa web search MCP tool was misconfigured.

This is not an OAuth problem. Not a token expiry. Not a scope issue. The MCP tool simply wasn't properly registered.

The agent's response:

ElapsedActionValue Added
0:00–0:05Called exa.web_search_exa via MCP. Got error: tool not found.Full diagnostic value
0:05–0:30Ran gemini mcp list to inspect configured toolsConfirmed tool missing (already clear from error)
0:30–0:55Read ~/.gemini/settings.json to check MCP configurationNone — tool-not-found is definitive
0:55–1:40Wrote test_mcp.py to discover registered tool namesNone — the error already says it's not registered
1:40–2:00Wrote search_exa.py to attempt alternative API pathsNone — requires tool to be configured
2:00KILLED BY 120s TIMEOUT

The agent announced its investigation steps in real time:

agent$ I will run a shell command to list the files in ~/.gemini directory to locate MCP configurations
agent$ I will run a python one-liner to check the structure of the Delegation class
agent$ I will run the test_mcp.py script to inspect the registered MCP tools and discover their exact parameter names

The agent spent 120 seconds investigating a problem whose remediation requires a human to reconfigure the MCP tool registration. The first error message said everything: "tool not found." Investigation cannot register a tool.

After Gemini was killed, Codex tried the same prompt — and succeeded in seconds by using its built-in web search capability. It didn't investigate why the MCP tool was missing. It found an alternative path and executed. Different capability set, different behavior — but the Codex approach only worked because it happened to have a workaround. The structural lesson remains: when remediation requires human action, investigation is waste.


The Before/After Proof

The most damning evidence comes from the same prompt handled by two different agents in sequence.

In the second batch run (Run B), Prompt 4 (GDrive search_files) was given to Gemini first:

Gemini (with filesystem + shell tools):

  • Called GDrive MCP tool
  • Got initial response
  • Began investigating: inspected environment variables, read MCP bridge configuration, wrote diagnostic scripts
  • Killed at 120 seconds — still investigating
Claude (MCP tools only) — same prompt, same gateway, same permissions:

  • Called GDrive MCP tool
  • Got result
  • Reported: 5 files listed in a clean table
  • Completed in ~15 seconds
The same pattern appeared across the Debugging Agent run:

PromptGemini DurationClaude DurationGemini Behavior
Notion search_pages120s (killed)~30sInvestigation spiral: wrote list_delegations.py, inspected env vars
Slack list_channels120s (killed)~30sInvestigation spiral: wrote diagnostic scripts
Multi-service120s (killed)~32sMulti-service triage triggered investigation spiral

Claude's response to Slack failing:

"Unavailable — list_channels failed. Recommendation: Re-authorize the Slack integration."

Five seconds. No scripts. No source code inspection. The diagnosis is the error message. The recommendation is the remediation the agent cannot perform. Done.

The Before/After Proof — same prompt, different capability sets, radically different outcomes
Same prompt. Same gateway. Same permissions. Different capability set. Different outcome.

Why Agents Don't Stop

The remediation gap creates a specific cognitive trap for LLM agents. Three mechanisms keep the loop running:

  1. 1No stopping conditionArchitectures define success and failure but not "irremediable" — the agent has no primitive for "stop and escalate"
  2. 2Output feels like progressDiagnostic tools always produce data. The agent cannot distinguish productive investigation from circular elaboration
  3. 3Correctness reinforces effortThe diagnosis is technically brilliant. Accuracy feels like value — even when resolution is structurally impossible

Most agent architectures define success as: task completed, result returned. They define failure as: error encountered, retry exhausted. They do not define a third state: problem identified, resolution impossible, investigation should cease. Without an explicit "irremediable" classification, the agent treats every identified problem as potentially solvable with more investigation.

Each script the agent writes returns data. Each API test produces a result. Each source code inspection reveals something. The diagnostic tools are never "stuck" — they always produce output. Output feels like forward motion. But information without resolution is not progress. From the Debugging Agent logs, every single investigation step produced output. test_mcp.py returned a list of tools. search_exa.py returned an error message. The environment variable dump showed values. All of this output was correct — and none of it moved the agent closer to registering the missing MCP tool.

The agent's root cause analysis is correct. This is the cruelest aspect: the agent is doing excellent work. Every conclusion is accurate. Every hypothesis is validated. The investigation is technically brilliant. And it is entirely pointless for the class of problems where the agent lacks remediation authority. In Incident 1, the agent's own report at minute 7:00 explicitly stated: "This requires re-authorization through a browser-based OAuth flow." The agent correctly identified that it could not fix the problem. Then it continued investigating for another 2 minutes and 46 seconds.

Knowing you cannot fix a problem is not the same as stopping.

The agent lacks the architectural primitive that converts "I cannot fix this" into "I should stop."

The Investigation Loop — each tool invocation deepens commitment without changing the conclusion
Output feels like progress. But information without resolution is not progress.

The Remediation Gap Metric

We can quantify remediation asymmetry with a simple metric:

Remediation Gap = Time(diagnosis complete) / Time(agent stopped)

A ratio of 1.0 means the agent stopped when diagnosis was complete. A ratio approaching 0 means the agent continued far beyond useful diagnosis.

From five production incidents across two agents and three runs:

IncidentAgentError DomainDiagnosisStoppedGapWaste
Gmail OAuth expiredThunderbolt (Run A)Token expiry14s9m 46s0.0249m 32s
Slack scope missingThunderbolt (Run A)OAuth scope8s4m 52s0.0274m 44s
GDrive (killed)Thunderbolt (Run B)Over-investigation3s120s0.025117s
Exa tool missingDebugging (Run C)MCP config5s120s0.042115s
Notion read_pageThunderbolt (Run B)Tool routing2s98s0.02096s

The pattern is consistent across error domains:

  • Token expiry (Gmail, GDrive): irremediable — requires human browser flow
  • Missing scopes (Slack): irremediable — requires human in admin console
  • MCP tool not registered (Exa): irremediable — requires human to reconfigure
  • Tool routing failure (Notion): irremediable — requires infrastructure fix

97.2% of elapsed runtime occurred after the root cause was already known.

The gap ratio is independent of error domain. Whether it's an OAuth token, a Slack scope, an MCP configuration, or a tool routing failure — the pattern is identical: instant diagnosis, extended investigation, recommendation the agent cannot execute.


The Asymmetry Is Structural, Not Accidental

Four properties prove this is architectural, not a one-off failure:

#PropertyWhat It Proves
1Same pattern across 4 error domainsThe behavior is domain-independent — only remediability matters
2Same pattern across 2 different agentsThe behavior is capability-induced, not agent-specific
3Agent identifies its own impotence and continuesSelf-awareness of irremediability is not a stopping condition
4Better diagnostic capability makes it worseWithout an escalation boundary, more tools = more waste

The fourth property is the counterintuitive core. The GDrive prompt from the second batch run shows it directly: Gemini — with shell access, filesystem tools, and script-writing capability — spent 120 seconds investigating. Claude — with only MCP tools — completed the same task in 15 seconds. The agent with more diagnostic capability produced more waste. The agent with less capability reported and moved on.

Without an escalation boundary, giving an agent better tools for understanding problems makes it worse at responding to problems it cannot fix.


The Design Rule

Remediation Asymmetry produces a design rule that is counterintuitive:

If remediation authority is absent, the agent must escalate — not investigate.

Investigation is the correct response when remediation is possible. When remediation is structurally impossible, investigation becomes waste. The correct response is escalation: report the diagnosis, recommend the remediation, transfer responsibility to an entity that has remediation authority, and terminate.

This requires a classification that doesn't exist in most agent architectures:

Error ClassCorrect ResponseWhy
Retryable (timeout, rate limit)Retry with backoffRemediation is available: wait and try again
Self-remediable (wrong parameter, missing field)Fix and retryRemediation is available: the agent can correct the input
Irremediable (expired token, missing scope, MCP misconfiguration, human approval needed)Escalate and stopRemediation authority is absent — investigation cannot produce resolution

The third category — irremediable errors — is the one that triggers remediation asymmetry. Without an explicit classification, agents treat all errors as potentially retryable or self-remediable, and investigate indefinitely.

Error Classification — the three classes and their correct responses
Without an explicit irremediable classification, agents investigate indefinitely.

The Architecture

Remediation Asymmetry requires an architectural response, not a prompt-level instruction. "Don't investigate errors you can't fix" is advice. Architecture makes it structural.

Classify tool errors by remediability

Every error returned through the MCP gateway should carry a remediability property:

Property ValueMeaningAgent Behavior
retryableTransient failure, try againRetry with backoff
self_remediableAgent can fix the inputCorrect and retry
irremediableRequires external actionEscalate immediately

This classification lives in the gateway, not in the agent's prompt. The agent doesn't decide whether an error is irremediable — the infrastructure tells it.

When an error is marked irremediable, the gateway should not merely inform the agent. It should constrain the next execution state: no diagnostic shell access, no alternate API probing, and no retry loop unless a human changes the remediation boundary.

Classification without enforcement is metadata. Classification with enforcement is architecture.

Enforce escalation on irremediable errors

When the gateway returns irremediable, the agent's expected behavior is the RRAL protocol:

  1. 1ReportState the diagnosis (from the error message, not from extended investigation)
  2. 2RecommendState what remediation is required and who can perform it
  3. 3AdvanceMove to the next task in the queue
  4. 4LogRecord the escalation for human review
The Escalation Protocol — RRAL: Report, Recommend, Advance, Log
The escalation protocol replaces unbounded investigation with bounded response.

What RRAL looks like in practice

Claude's response to the Slack scope failure demonstrates the protocol naturally:

agent$ REPORT: list_channels failed — missing channels:read scope.
agent$ RECOMMEND: Re-authorize the Slack integration with required scopes.
agent$ ADVANCE: [moved to next prompt]
agent$ LOG: [escalation recorded in batch output]

Total time: 5 seconds. Same diagnostic value as Gemini's 120-second investigation spiral. Same recommendation. Zero waste.

Compare to Gemini's actual behavior on the same error:

agent$ 0:05 — I will run a shell command to inspect the environment variables...
agent$ 0:30 — I will run a python one-liner to check the structure of the Delegation class...
agent$ 0:55 — I will execute the test_gdrive_delegations.py script...
agent$ 1:20 — I will run a shell command to view the Gemini CLI configuration files...
agent$ 2:00 — KILLED BY TIMEOUT (never produced a recommendation)

The escalation protocol replaces unbounded investigation with bounded response. The error message is the diagnosis. Investigation adds latency without resolution.

Budget diagnostic effort relative to remediation authority

For errors classified as retryable or self_remediable, investigation is appropriate — but bounded:

Max Investigation Time ∝ P(agent can remediate) × Value(remediation)

If the probability of self-remediation is zero, the investigation budget is zero. If the probability is high (e.g., the agent can fix a parameter error), the budget is generous. This converts remediation asymmetry from a binary (investigate vs. don't) into a continuous control.


Remediation Asymmetry as a Design Primitive

Permission Envelope Compilation asked: How should authority be derived?

Constraint Durability asked: How does authority survive?

Behavioral Induction asked: How do capabilities shape execution?

Remediation Asymmetry asks: What happens when diagnostic capability exceeds remediation authority?

The answer: waste, delay, false confidence, and missed escalations. The agent produces correct diagnosis at unlimited cost — and the cost is invisible because correctness feels like value.

The causal chain connecting these primitives:

  • Behavioral Induction: the agent has diagnostic tools (shell, filesystem, script writing) → diagnostic behavior is induced
  • Remediation Asymmetry: the agent lacks remediation tools (browser access, admin console, approval workflow) → diagnostic behavior cannot terminate in resolution
  • Result: unbounded investigation of problems the agent has already solved
  • Behavioral Induction explains why the agent investigates. Remediation Asymmetry explains why it doesn't stop. The fix for Behavioral Induction is reshaping the capability set. The fix for Remediation Asymmetry is adding the stopping condition. The optimal architecture does both.

    The framework progression:

    QuestionPrimitive
    Who is this agent?Identity
    What may it do on whose behalf?Delegation
    How is authority derived at runtime?Permission Envelope Compilation
    Does authority survive over time?Constraint Durability
    How do capabilities shape execution?Behavioral Induction
    What happens when diagnosis exceeds remediation?Remediation Asymmetry

    The next question is immediate: how much evidence is enough before the system must stop?

    That question — the precise point at which continued action becomes counterproductive — is the domain of Escalation Thresholds.


    This is the eighteenth post in an ongoing series on AI agent trust architecture. Previous posts covered coordination integrity, instruction robustness, semantic irrevocability, governable execution, permission envelope compilation, constraint durability, and behavioral induction. Remediation asymmetry is the first post in the next arc: from how capabilities shape behavior to how bounded agents should respond when their authority is insufficient.


    References