
In-Context Learning (ICL) is the emergent ability of Large Language Models to learn new tasks from examples provided directly in the prompt, without any modification to the model's weights. Rather than requiring expensive fine-tuning or retraining, ICL allows users to demonstrate a task through a handful of input-output pairs — and the model infers the pattern and applies it to new inputs within the same forward pass. This capability emerged at scale: models below roughly 10 billion parameters show minimal ICL ability, while larger models display increasingly robust task adaptation from context alone. ICL is the theoretical foundation underlying few-shot and zero-shot prompting and represents one of the most surprising capabilities of modern LLMs.
Why it matters
In-context learning fundamentally changed how organizations deploy AI. Before ICL, adapting a language model to a new task required collecting labeled training data, running fine-tuning jobs costing thousands of euros, and maintaining separate model versions per task. With ICL, a single general-purpose model handles classification, extraction, translation, summarization, and dozens of other tasks — switching between them simply by changing the examples in the prompt. This collapses the cost of task adaptation from weeks and thousands of euros to minutes and a few cents in API tokens. For businesses, ICL means that product teams and domain experts — not just machine learning engineers — can build and iterate on AI-powered features by crafting prompts rather than training models. The trade-off is that ICL performance depends heavily on example quality and selection, and it cannot match fine-tuned models on tasks requiring deep domain specialization.
How it works
The exact mechanism behind in-context learning remains an active area of research, but the leading theory is that large transformers develop an implicit ability to perform gradient-descent-like updates within their forward pass. When the model processes examples in the prompt, the attention mechanism identifies shared patterns across input-output pairs — input format, output structure, transformation logic — and activates internal representations that encode the inferred task. Recent research suggests that specific attention heads specialize in "induction" — copying patterns from earlier in the context to later positions. The model effectively builds a temporary task representation in its activations (not weights) that persists only for the current context. Key factors affecting ICL performance include: the number and diversity of examples, the similarity between examples and the target input, the ordering of examples within the prompt, and whether the model encountered similar task structures during pre-training.
Example
A logistics company needs to classify customer inquiries into eight categories (shipping status, returns, billing, damage claims, address changes, delivery scheduling, international customs, and general questions). Instead of fine-tuning a model — which would require labeling 10,000+ examples and cost €5,000 in compute — they provide eight examples in the prompt, one per category, each showing a real customer message and the correct label. The model achieves 82% accuracy immediately. They iterate three times over two hours, improving example selection by choosing messages that represent common edge cases — when a customer asks about both shipping status and a return in the same message, the example demonstrates the primary-category rule. Accuracy reaches 91%, sufficient for automated routing with human review on the lowest-confidence 9%. When a ninth category is added months later (subscription management), they simply add one example to the prompt and the system handles it immediately — no retraining, no deployment, no downtime.