
A hallucination occurs when a Large Language Model confidently generates information that is factually incorrect, fabricated, or nonsensical — while presenting it as authoritative truth. LLMs do not have a concept of truth; they predict statistically likely next tokens based on training patterns. This means they can produce plausible-sounding but completely false citations, statistics, historical events, code libraries, and technical specifications. Hallucination is considered the single most critical reliability challenge facing LLM deployments, with rates varying from 5% to over 30% depending on the task domain, model, and whether mitigation techniques like RAG are applied.
Why it matters
Hallucination represents the fundamental trust barrier for AI adoption in professional contexts. A legal AI that fabricates case citations, a medical AI that invents drug interactions, or a financial AI that produces fake statistics can cause serious real-world harm. Unlike traditional software bugs that produce consistent errors, LLM hallucinations are unpredictable — the model might answer 99 questions correctly and confidently fabricate the 100th with equal conviction. For businesses deploying AI, hallucination management is not optional: it requires retrieval-augmented generation, citation verification, human-in-the-loop review, or domain-specific guardrails. The cost of an undetected hallucination in high-stakes domains far exceeds the cost of the AI system itself.
How it works
Hallucinations arise from how LLMs generate text. The model produces each token based on statistical probability — what word is most likely to follow given the preceding context. When asked about topics at the edge of its training data, or when the statistically likely continuation happens to be factually wrong, the model has no mechanism to say "I don't know." It generates the most plausible-seeming continuation regardless of factual accuracy. Contributing factors include training data gaps, conflicting information in training data, the model's tendency to pattern-match rather than reason, and prompts that implicitly assume the model has knowledge it lacks. Extended reasoning models and RAG systems significantly reduce but do not eliminate hallucinations.
Example
A product team deploys a customer support bot trained to answer questions about their software platform. A customer asks about a feature that was deprecated two years ago. The bot, trained on documentation that includes both old and new versions, generates a confident step-by-step guide for using the deprecated feature — complete with menu paths that no longer exist and API endpoints that were removed. The customer follows the instructions, fails, contacts human support, and loses trust in the AI assistant entirely. The fix: implementing RAG that retrieves only from current documentation, adding a knowledge cutoff disclaimer, and configuring the system to explicitly say "I don't have information about that" when retrieval confidence is low.