AI Agent Series — Ran Wei

Module 4: Your First API Call

Making your first API calls to OpenAI and Anthropic.

OpenAI — Your First Call (GPT-4o)

Let us start with the most basic API call possible. The OpenAI Chat Completions API takes a list of messages (each with a role and content) and returns a model-generated response. This is the foundation of every agent you will build.

Minimal Example

from openai import OpenAI

# Client automatically reads OPENAI_API_KEY from environment
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an AI agent?"}
    ]
)

# Extract the response text
print(response.choices[0].message.content)

The messages array is the core concept. It represents a conversation history with three possible roles:

Role	Purpose	When to Use
`system`	Sets the agent's identity, rules, and behaviour	Once, at the start of the messages array
`user`	Messages from the human user	Every user input (questions, instructions, follow-ups)
`assistant`	Previous model responses (for context)	When building multi-turn conversations

Understanding the Response Object

The response from OpenAI contains much more than just the text. Here is how to inspect it:

# Full response inspection
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain AI agents in one sentence."}
    ]
)

# The response text
print(response.choices[0].message.content)

# Why the model stopped generating
print(response.choices[0].finish_reason)  # "stop", "length", or "tool_calls"

# Token usage (important for cost tracking!)
print(f"Prompt tokens:     {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens:      {response.usage.total_tokens}")

# Model used
print(f"Model: {response.model}")

NOTE

The finish_reason field is critical for agents. A value of "stop" means the model finished naturally. "length" means it hit the token limit (your response is truncated!). "tool_calls" means the model wants to call a tool — this is the foundation of agent behaviour covered in Module 6.

Controlling Output with Temperature and Max Tokens

Temperature controls randomness. For agents, you almost always want low temperature (deterministic, reliable output). Max tokens caps the response length.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a precise data analyst."},
        {"role": "user", "content": "What is 15% of 847?"}
    ],
    temperature=0.0,    # Deterministic output (best for agents)
    max_tokens=100,     # Limit response length
)
print(response.choices[0].message.content)
# Output: "15% of 847 is 127.05."

Temperature	Behaviour	Best For
0.0	Deterministic — same input always gives same output	Agents, data extraction, calculations
0.3–0.5	Slightly creative but mostly consistent	Customer support, structured writing
0.7–1.0	Creative and varied	Brainstorming, creative writing, exploration
1.5–2.0	Highly random, often incoherent	Rarely useful in practice

TIP

For agent applications, set temperature=0 during development and testing so results are reproducible. You can increase it later for specific creative tasks.

Multi-Turn Conversations

To have a back-and-forth conversation, you append both the assistant's response and the user's follow-up to the messages array:

messages = [
    {"role": "system", "content": "You are a helpful maths tutor."},
    {"role": "user", "content": "What is a derivative?"},
]

# First turn
response = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")

# Append assistant reply and user follow-up
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})

# Second turn - model has full context of the conversation
response = client.chat.completions.create(model="gpt-4o", messages=messages)
print(f"Assistant: {response.choices[0].message.content}")

ANALOGY

The messages array is like a transcript of a conversation. The model has no memory between API calls — you must send the entire transcript every time. This is why the agent loop (Module 5) continuously appends to this array.

Anthropic — Your First Call (Claude Sonnet 4)

Anthropic's Messages API follows similar concepts but with a different syntax. The biggest difference is that the system prompt is a separate parameter, not part of the messages array.

Minimal Example

import anthropic

# Client automatically reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,                              # Required (not optional!)
    system="You are a helpful assistant.",         # Separate from messages
    messages=[
        {"role": "user", "content": "What is an AI agent?"}
    ]
)

# Extract the response text
print(message.content[0].text)

Understanding the Anthropic Response

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain AI agents in one sentence."}]
)

# The response text (Anthropic returns a list of content blocks)
print(message.content[0].text)

# Stop reason
print(message.stop_reason)  # "end_turn", "max_tokens", or "tool_use"

# Token usage
print(f"Input tokens:  {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

# Model used
print(f"Model: {message.model}")

NOTE

Anthropic's stop_reason uses different values than OpenAI. The equivalent of OpenAI's "stop" is "end_turn". The equivalent of "tool_calls" is "tool_use". Always check the provider's documentation for exact values.

Multi-Turn Conversations with Anthropic

messages = [
    {"role": "user", "content": "What is a derivative in calculus?"},
]

# First turn
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful maths tutor.",
    messages=messages
)
assistant_reply = response.content[0].text

# Append and continue
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful maths tutor.",
    messages=messages
)
print(response.content[0].text)

Structured Output with Anthropic

For agent use cases, you often need the model to return structured data (JSON) rather than free-form text. Here is how to request and parse structured output:

import json

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="""You are a data extraction assistant.
Always respond with valid JSON. No other text.""",
    messages=[{
        "role": "user",
        "content": "Extract the key details: 'Meeting with Sarah on Tuesday at 3pm to discuss Q4 budget. Location: Room 204.'"
    }]
)

# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
# {
#   "event": "Meeting",
#   "attendee": "Sarah",
#   "day": "Tuesday",
#   "time": "3:00 PM",
#   "topic": "Q4 budget",
#   "location": "Room 204"
# }

Key API Differences

When building agents, you will often want to support multiple LLM providers (for cost optimisation, fallback, or capability differences). Here is a detailed comparison of the two APIs:

Feature	OpenAI	Anthropic
System prompt	Inside `messages` array as `role: "system"`	Separate `system` parameter
Response text	`response.choices[0].message.content`	`message.content[0].text`
Max tokens	Optional (defaults to model max)	Required — must always specify
Stop reason	`"stop"`, `"length"`, `"tool_calls"`	`"end_turn"`, `"max_tokens"`, `"tool_use"`
Token usage	`usage.prompt_tokens` / `usage.completion_tokens`	`usage.input_tokens` / `usage.output_tokens`
Tool format	Wrapped in `type: "function"`	Direct `input_schema` at top level
Streaming	`stream=True` with SSE events	`stream=True` or `client.messages.stream()`

Writing a Provider-Agnostic Wrapper

Since the concepts are identical but syntax differs, a common pattern is to write a thin wrapper that normalises the two APIs:

from openai import OpenAI
import anthropic

class LLMClient:
    """Unified wrapper for OpenAI and Anthropic APIs."""

    def __init__(self, provider: str = "openai"):
        self.provider = provider
        if provider == "openai":
            self.client = OpenAI()
            self.model = "gpt-4o"
        elif provider == "anthropic":
            self.client = anthropic.Anthropic()
            self.model = "claude-sonnet-4-20250514"

    def chat(self, system: str, user_message: str,
             temperature: float = 0.0, max_tokens: int = 1024) -> str:
        """Send a message and return the response text."""
        if self.provider == "openai":
            response = self.client.chat.completions.create(
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": user_message},
                ],
            )
            return response.choices[0].message.content

        elif self.provider == "anthropic":
            message = self.client.messages.create(
                model=self.model,
                max_tokens=max_tokens,
                system=system,
                messages=[{"role": "user", "content": user_message}],
            )
            return message.content[0].text

# Usage - switch providers with one parameter
llm = LLMClient(provider="anthropic")
answer = llm.chat(
    system="You are a helpful assistant.",
    user_message="What is an AI agent?"
)
print(answer)

TIP

Both APIs support streaming, tool use, vision (image input), and multi-turn conversation. The concepts are identical — only the syntax differs. Learning one makes it easy to use the other. In this tutorial series, we show both wherever possible so you can choose your preferred provider.

Cost Awareness

API calls cost money. Understanding token pricing is essential for building agents that do not break the bank:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
GPT-4o mini	$0.15	$0.60	Simple tasks, classification, cheap testing
GPT-4o	$2.50	$10.00	Complex reasoning, agent core
Claude Haiku	$0.25	$1.25	Fast, simple tasks, high-volume
Claude Sonnet 4	$3.00	$15.00	Complex reasoning, agent core

PITFALL

An agent loop that runs 10 steps, each sending the full conversation history, can quickly consume thousands of tokens. Always monitor usage.total_tokens (or usage.input_tokens + usage.output_tokens) and set budget limits during development. A runaway agent loop can cost tens of dollars in minutes.

Up Next

Module 5 — Building the Agent Loop