
Programmatic Tool Calling is Anthropic's API paradigm shift that replaces traditional JSON-based function calling with direct code execution for tool invocation. Instead of generating a JSON object describing a function call and waiting for a round-trip response, Claude outputs and executes a block of code within an environment to dynamically invoke multiple MCP tools, parse results, and use control flows like loops and conditionals before returning a final answer. This solves three critical problems with standard function calling: token bloat from serializing every tool interaction as JSON, high latency from sequential round-trips for multi-tool tasks, and context exhaustion from accumulating verbose tool call/response pairs. Programmatic tool calling enables agents to handle complex multi-step workflows in a single execution pass.
Why it matters
Standard JSON-based function calling has become the bottleneck of agent performance. Each tool call requires a full round-trip: the model generates a JSON request, the runtime executes it, and the response is fed back into the model's context. For a task that requires ten tool calls, this means ten round-trips, each adding latency and consuming context tokens. The accumulated JSON payloads can exhaust the model's context window before the task is complete. Programmatic tool calling eliminates this overhead by letting the model write and execute code that handles the entire tool interaction sequence locally. This is not just an optimization — it unlocks agent capabilities that were previously impractical, such as iterating over large datasets, implementing retry logic, and orchestrating complex conditional workflows that depend on intermediate tool results.

How it works
When an agent using programmatic tool calling encounters a task requiring external tools, it generates a code block (typically Python) instead of a JSON tool call. This code block runs in a sandboxed execution environment with access to MCP tool bindings. The code can call multiple tools in sequence, branch based on results, loop over collections, aggregate data, and format the final output — all within a single execution. The model writes the orchestration logic, the runtime executes it, and only the final result is returned to the conversation context. This collapses what would be multiple round-trips into a single code execution, dramatically reducing latency and token usage. The approach also gives agents native programming constructs like error handling, data transformation, and mathematical operations without needing specialized tools for each.
Example
A user asks an AI assistant to find all overdue invoices from the accounting system, calculate late fees based on contractual terms stored in the CRM, and generate a summary email draft. With traditional function calling, this requires: (1) call accounting API to list invoices, (2) wait for response, (3) filter overdue ones, (4) for each overdue invoice, call CRM to get contract terms, (5) wait for each response, (6) calculate fees, (7) call email API to create draft. That is at least seven round-trips. With programmatic tool calling, Claude writes a Python script that fetches invoices, filters them in a loop, batches the CRM lookups, computes fees with native arithmetic, and calls the email API once with the complete summary — all in a single code execution that takes seconds instead of multiple round-trip cycles.