AI Agent Series — Ran Wei

Module 6: Tool Use & Function Calling

Giving your agent superpowers by defining callable tools.

Why Tools Matter

Without tools, an LLM can only generate text. It cannot look up today's weather, query a database, send an email, or run code. Tools transform a chatbot into an agent by giving it the ability to take actions in the real world and retrieve information beyond its training data.

The core insight is that LLMs are excellent at deciding what to do and interpreting results, but they need external functions to actually do things. Function calling is the bridge between language understanding and real-world action.

Consider a customer-support agent. Without tools, it can only say "I'd be happy to check your order status." With tools, it can actually call get_order_status(order_id="12345"), retrieve the real data, and present it to the user. This is the difference between a helpful-sounding chatbot and a genuinely useful agent.

ANALOGY

Think of an LLM as a brilliant person sitting in a sealed room. They can reason and communicate through a slot in the door, but they cannot see or touch the outside world. Tools are like giving them a phone, a computer, and access to filing cabinets — suddenly they can accomplish real tasks.

Information Retrieval

Search engines, databases, APIs — fetch live data the model was never trained on.

Computation

Calculators, code interpreters — perform precise math and logic that LLMs struggle with.

Side Effects

Send emails, create tickets, update records — take actions that change the world.

Perception

Read files, parse images, process audio — extend the agent's senses beyond text.

The Function Calling Protocol

Function calling follows a specific protocol between your application and the LLM. The model never executes tools directly — it outputs a structured request describing which tool to call and with what arguments. Your application then executes the function and feeds the result back to the model.

The Four-Step Dance

Define — You describe available tools (name, description, parameters) when calling the API.
Decide — The model analyses the user's request and decides whether to call a tool, and if so, which one and with what arguments.
Execute — Your code receives the tool-call request, runs the actual function, and collects the result.
Respond — You send the tool result back to the model, which incorporates it into a natural-language response.

# Conceptual flow of function calling
# Step 1: User asks a question
user_msg = "What's the weather in London?"

# Step 2: Model decides to call a tool (returns structured JSON)
# {tool_name: "get_weather", arguments: {city: "London"}}

# Step 3: Your code executes the real function
result = get_weather(city="London")  # returns {"temp": 12, "condition": "cloudy"}

# Step 4: You send the result back; model writes a natural response
# "It's currently 12C and cloudy in London."

NOTE

The model never has direct access to your functions. It only sees the descriptions you provide. This means your tool descriptions are critically important — they are the model's only guide to understanding what each tool does, when to use it, and what arguments to provide.

PITFALL

A common mistake is assuming the model executes tools. It does not. If you forget to actually call the function and send results back, the agent loop stalls. Always implement the execution step in your code.

Tool Definitions — OpenAI vs Anthropic

Both major providers use JSON Schema to describe tool parameters, but the envelope format differs. Understanding both lets you build agents that work across providers.

OpenAI Format

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city. Returns temperature in Celsius and conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'London' or 'New York'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit (default: celsius)"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

import openai
client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Paris?"}],
    tools=tools
)

Anthropic Format

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Returns temperature in Celsius and conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'London' or 'New York'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit (default: celsius)"
                }
            },
            "required": ["city"]
        }
    }
]

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Paris?"}]
)

Aspect	OpenAI	Anthropic
Wrapper key	`"function"` inside `"type": "function"`	Flat — name, description, input_schema at top level
Schema key	`"parameters"`	`"input_schema"`
Tool call in response	`tool_calls[].function.arguments` (JSON string)	`content[]` block with `type: "tool_use"`
Result message role	`"tool"`	`"user"` with `tool_result` block

TIP

Write detailed, specific descriptions for every tool and every parameter. Include examples, edge cases, and expected formats. The model relies entirely on these descriptions to decide when and how to call each tool. Vague descriptions like "Search the web" lead to poor tool selection; "Search the web using Google and return the top 5 results as title+snippet pairs" is much better.

The Tool Execution Loop

In a real agent, tool calling happens inside a loop. The model may call one tool, inspect the result, then call another — or it may call multiple tools in parallel. Your agent loop must handle all these cases gracefully.

Complete Anthropic Tool Loop

import anthropic
import json

client = anthropic.Anthropic()

# Define your actual tool implementations
def get_weather(city: str, units: str = "celsius") -> dict:
    """Simulate a weather API call."""
    data = {"London": {"temp": 12, "condition": "cloudy"},
            "Paris": {"temp": 18, "condition": "sunny"}}
    return data.get(city, {"temp": 0, "condition": "unknown"})

def search_news(query: str, max_results: int = 5) -> list:
    """Simulate a news search."""
    return [{"title": f"News about {query}", "source": "Reuters"}]

# Map tool names to functions
TOOL_REGISTRY = {
    "get_weather": get_weather,
    "search_news": search_news,
}

# Tool definitions for the API
tools = [
    {"name": "get_weather",
     "description": "Get current weather for a city.",
     "input_schema": {"type": "object",
                      "properties": {"city": {"type": "string"},
                                     "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}},
                      "required": ["city"]}},
    {"name": "search_news",
     "description": "Search recent news articles.",
     "input_schema": {"type": "object",
                      "properties": {"query": {"type": "string"},
                                     "max_results": {"type": "integer"}},
                      "required": ["query"]}}
]

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # Check if the model wants to use tools
        if response.stop_reason == "tool_use":
            # Collect all tool calls and results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    func = TOOL_REGISTRY[block.name]
                    result = func(**block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

            # Add assistant response and tool results to messages
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        else:
            # Model is done — extract text response
            return "".join(b.text for b in response.content if hasattr(b, "text"))

# Usage
answer = run_agent("What's the weather in London and any AI news today?")

NOTE

The loop continues until stop_reason is "end_turn" (not "tool_use"). This allows the model to chain multiple tool calls — for example, searching for a city name first, then getting its weather.

OpenAI Tool Loop

import openai
import json

client = openai.OpenAI()

def run_agent_openai(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools  # same JSON Schema definitions
        )
        msg = response.choices[0].message

        if msg.tool_calls:
            messages.append(msg)  # add assistant message with tool calls
            for tc in msg.tool_calls:
                func = TOOL_REGISTRY[tc.function.name]
                args = json.loads(tc.function.arguments)
                result = func(**args)
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": json.dumps(result)
                })
        else:
            return msg.content

Building a Tool Registry

As your agent grows, you will have dozens of tools. A tool registry pattern keeps them organised, validated, and easy to extend. The registry maps tool names to their implementations and auto-generates API definitions from Python type hints.

import inspect
import json
from typing import Callable, Any, get_type_hints

class ToolRegistry:
    """Registry that auto-generates tool schemas from type hints."""

    def __init__(self):
        self._tools: dict[str, Callable] = {}
        self._schemas: list[dict] = []

    def tool(self, func: Callable) -> Callable:
        """Decorator to register a tool function."""
        name = func.__name__
        hints = get_type_hints(func)
        doc = inspect.getdoc(func) or ""

        # Build JSON Schema from type hints
        properties = {}
        required = []
        sig = inspect.signature(func)
        for param_name, param in sig.parameters.items():
            ptype = hints.get(param_name, str)
            json_type = {"str": "string", "int": "integer",
                         "float": "number", "bool": "boolean"}.get(ptype.__name__, "string")
            properties[param_name] = {"type": json_type}
            if param.default is inspect.Parameter.empty:
                required.append(param_name)

        schema = {
            "name": name,
            "description": doc,
            "input_schema": {
                "type": "object",
                "properties": properties,
                "required": required
            }
        }
        self._tools[name] = func
        self._schemas.append(schema)
        return func

    def execute(self, name: str, arguments: dict) -> Any:
        """Execute a registered tool by name."""
        if name not in self._tools:
            return {"error": f"Unknown tool: {name}"}
        try:
            return self._tools[name](**arguments)
        except Exception as e:
            return {"error": str(e)}

    @property
    def definitions(self) -> list[dict]:
        return self._schemas

# Usage
registry = ToolRegistry()

@registry.tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely. Example: '2 + 3 * 4'"""
    allowed = set("0123456789+-*/.() ")
    if not all(c in allowed for c in expression):
        return "Error: invalid characters"
    return str(eval(expression))  # use a safe parser in production

@registry.tool
def read_file(path: str) -> str:
    """Read the contents of a text file given its path."""
    with open(path, "r") as f:
        return f.read()

# Pass registry.definitions to the API, use registry.execute() in your loop

TIP

In production, use pydantic models for tool input validation. Libraries like instructor or Anthropic's own tool-use helpers can auto-generate schemas from Pydantic models, giving you both validation and schema generation in one place.

PITFALL

Never use bare eval() in production. Use a safe math parser like asteval or simpleeval. The example above is simplified for clarity.

Practical Tool Examples

Here is a collection of commonly used tools in real-world agents. Each addresses a different class of capability that LLMs lack on their own.

Web Search

Query Google, Bing, or Brave APIs to get live information. Essential for any agent that answers questions about current events.

Calculator

Safe mathematical evaluation. LLMs frequently make arithmetic errors — always offload computation to a tool.

File Reader

Read and parse local files (CSV, JSON, PDF). Enables document-processing agents.

Database Query

Execute SQL queries against PostgreSQL, MySQL, or SQLite. The model generates SQL; your tool executes it safely.

Code Executor

Run Python in a sandboxed environment (Docker, E2B, or subprocess). Powers data-analysis and coding agents.

Email / Messaging

Send emails via SMTP or API (SendGrid, SES). Allows agents to communicate with humans and other systems.

Example: Database Query Tool

import sqlite3

@registry.tool
def query_database(sql: str) -> str:
    """Execute a read-only SQL query against the app database.
    Only SELECT statements are allowed. Returns results as JSON."""
    if not sql.strip().upper().startswith("SELECT"):
        return json.dumps({"error": "Only SELECT queries are allowed"})

    conn = sqlite3.connect("app.db")
    conn.row_factory = sqlite3.Row
    try:
        rows = conn.execute(sql).fetchall()
        return json.dumps([dict(row) for row in rows])
    except Exception as e:
        return json.dumps({"error": str(e)})
    finally:
        conn.close()

PITFALL

SQL injection is a real risk. Even with read-only restrictions, a model-generated query could access sensitive tables. In production, use parameterised queries, restrict accessible tables via a view layer, and run the database user with minimal permissions.

Best Practices Checklist

Write descriptions as if explaining to a new colleague — include examples and edge cases
Use enum constraints to limit parameter values where possible
Return structured JSON from tools, not free-form text
Always handle errors gracefully — return error messages rather than raising exceptions
Add timeout limits to prevent tools from hanging indefinitely
Log every tool call for debugging and auditing
Consider tool_choice parameter to force or prevent specific tool usage

Up Next

Module 7 — Memory & Context Management