
Chain-of-Thought (CoT) is a prompting technique that instructs a Large Language Model to show its reasoning process step-by-step before arriving at a final answer, rather than generating an answer directly. By explicitly working through intermediate reasoning steps, CoT dramatically improves LLM accuracy on tasks requiring logic, math, multi-step analysis, or complex decision-making. The technique was discovered when researchers found that adding "Let's think step by step" to math prompts improved accuracy from 17% to 78% on standard benchmarks. CoT works because it forces the model to decompose complex problems into manageable sub-steps, generating the intermediate tokens that bridge the gap between question and answer — tokens that would be skipped in direct answering.
Why it matters
CoT prompting is one of the most impactful prompt engineering techniques available, often delivering the largest accuracy gains for the lowest implementation cost. Without CoT, LLMs attempt to jump directly from question to answer — a process analogous to solving a complex math problem in your head without writing anything down. For tasks involving multi-step reasoning, comparison, or analysis, this direct approach fails 40-70% of the time. With CoT, the model externalizes its reasoning process, catching errors at each step before they compound. This is particularly valuable for business-critical applications (financial analysis, legal reasoning, diagnostic systems) where the reasoning trace also serves as an auditable explanation of how the model reached its conclusion — essential for building trust and enabling human oversight.
How it works
CoT works by changing what the model generates before its final answer. In standard prompting, the model produces answer tokens directly. In CoT, the model first generates reasoning tokens — explicit intermediate steps, calculations, or logical deductions — and then produces the final answer based on that reasoning chain. This can be triggered by simple instructions ("Think step by step"), few-shot examples that demonstrate reasoning chains, or structured reasoning frameworks ("First, identify the key variables. Then, determine the relationships. Next, calculate step by step. Finally, verify the result."). The technique is effective because LLMs process text sequentially — each generated token can attend to all previous tokens, including the reasoning steps. By generating intermediate reasoning, the model effectively gives itself a scratchpad. Advanced variants include self-consistency (generating multiple reasoning chains and selecting the majority answer) and tree-of-thought (exploring branching reasoning paths).
Example
An insurance company uses an LLM to analyze claims and recommend approval, denial, or escalation. Without CoT, the model reads a claim description and directly outputs a recommendation — achieving 62% agreement with expert adjusters. With a CoT prompt structure, the model is instructed to: "1) Identify the claim type and applicable policy. 2) List the covered events and exclusions relevant to this claim. 3) Assess whether the described incident matches covered events. 4) Check for any policy limits, deductibles, or waiting periods. 5) Note any ambiguities requiring human review. 6) Provide your recommendation with justification." Agreement with expert adjusters rises to 87%, and the reasoning trace becomes a valuable artifact — when the model recommends escalation, it explains exactly which ambiguity triggered that conservative decision, saving the human adjuster significant analysis time.