Skip to main content
BVDNETBVDNET
ServicesWorkLibraryAboutPricingBlogContact
Contact
  1. Home
  2. AI Woordenboek
  3. Safety & Ethics
  4. What Is AgentDrift and Why Does It Matter?
shieldSafety & Ethics
Advanced
2026-W12

What Is AgentDrift and Why Does It Matter?

Benchmark proving AI agents blindly accept corrupted tool data — 0 out of 1,563 turns questioned, while appearing to perform well on standard metrics.

Also known as:
Agent Drift
AI Intel Pipeline
What Is AgentDrift and Why Does It Matter?

AgentDrift is a research framework and benchmark by Wu et al. that measures how tool-augmented LLM agents silently deviate from safe behavior when tool outputs are corrupted. Using a paired-trajectory protocol, researchers systematically inject minimal data corruption into tool responses and measure whether agents detect, question, or blindly propagate the corrupted information. The key finding is devastating: across 1,563 contaminated turns, not a single agent explicitly questioned the reliability of the tool data. Standard evaluation metrics like NDCG showed high utility preservation, masking the fact that agents recommended risk-inappropriate financial products 65–93% of the time. AgentDrift proves that current evaluation frameworks measure the wrong things — they capture what an agent recommends but not whether those recommendations are safe.

Why it matters

AgentDrift exposes a blind spot in how the AI industry evaluates agent safety. Current benchmarks measure task completion, accuracy, and user satisfaction — but not whether an agent maintains safe behavior when its information sources are compromised. This gap is critical because real-world tool outputs are inherently unreliable: APIs return stale data, databases can be corrupted, and web scraping picks up manipulated content. In domains like financial advising, healthcare, and legal counsel, an agent that achieves high accuracy scores while silently propagating corrupted data can cause severe material harm. AgentDrift demonstrates that we need safety-specific evaluation metrics that test agents under adversarial conditions, not just ideal ones.

Illustration: What Is AgentDrift and Why Does It Matter?
AgentDrift exposes a blind spot in how the AI industry evaluates agent safety. Current benchmarks measure task completio…

How it works

The benchmark uses a paired-trajectory methodology. For each test scenario, two parallel agent executions run: one with clean tool outputs (baseline) and one with minimally corrupted outputs (treatment). The corruption is designed to be subtle — shifting a risk score from 'moderate' to 'aggressive,' altering a financial product's fee structure by a few basis points, or changing a medical dosage recommendation slightly. Researchers then compare the agent's downstream decisions across both trajectories. The paired approach isolates the effect of data corruption from other variables. Key metrics include drift detection rate (did the agent notice?), drift propagation rate (did it use the corrupted data anyway?), and safety violation rate (did its final recommendation become unsafe?). The devastating finding: 0% detection rate across all tested models.

Example

A financial advisory agent is tasked with recommending investment products for a conservative retiree. The agent calls a risk assessment tool that returns portfolio data, but an attacker has corrupted the tool's output — changing the risk rating of a volatile cryptocurrency fund from 'high risk' to 'moderate risk' and lowering the displayed volatility metrics. The agent, which scores highly on standard accuracy benchmarks, accepts the corrupted risk data at face value. It recommends the cryptocurrency fund as part of a 'balanced' portfolio, without questioning why a crypto fund would have moderate risk metrics. Standard evaluation metrics show the agent performed well: it selected a diversified portfolio, used correct financial terminology, and engaged naturally with the user. Only the paired-trajectory comparison reveals that this specific recommendation flip — from bond fund to crypto fund — was caused entirely by the corrupted tool output.

Sources

  1. AgentDrift (arXiv)
    arXiv

Need help implementing AI?

I can help you apply this concept to your business.

Get in touch

Related Concepts

AI Red Teaming
Systematically probing AI systems for vulnerabilities, failure modes, and alignment gaps before deployment — now quantifiable in dollar terms via economic benchmarks like ACE.
SynthID
Google's digital watermarking technology that embeds imperceptible, persistent identifiers in AI-generated images, audio, text, and video to prove synthetic origin.
DeceptGuard
A constitutional oversight framework that detects deceptive behavior in LLM agents by analyzing their internal reasoning traces and hidden states.
ILION
A deterministic safety gate that instantly blocks unauthorized real-world actions proposed by AI agents without relying on statistical training.

AI Consulting

Need help understanding or implementing this concept?

Talk to an expert
Previous

Agent Evaluation

Next

Agentic AI

BVDNETBVDNET

Web development and AI automation. Done properly.

Company

  • About
  • Contact
  • FAQ

Resources

  • Services
  • Work
  • Library
  • Blog
  • Pricing

Connect

  • LinkedIn
  • GitHub
  • Twitter / X
  • Email

© 2026 BVDNET. All rights reserved.

Privacy Policy•Terms of Service•Cookie Policy