Module 4: Your First API Call
Making your first API calls to OpenAI and Anthropic.
OpenAI — Your First Call (GPT-4o)
Let us start with the most basic API call possible. The OpenAI Chat Completions API takes a list of messages (each with a role and content) and returns a model-generated response. This is the foundation of every agent you will build.
Minimal Example
from openai import OpenAI
# Client automatically reads OPENAI_API_KEY from environment
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is an AI agent?"}
]
)
# Extract the response text
print(response.choices[0].message.content)
The messages array is the core concept. It represents a conversation history with three possible roles:
| Role | Purpose | When to Use |
|---|---|---|
system | Sets the agent's identity, rules, and behaviour | Once, at the start of the messages array |
user | Messages from the human user | Every user input (questions, instructions, follow-ups) |
assistant | Previous model responses (for context) | When building multi-turn conversations |
Understanding the Response Object
The response from OpenAI contains much more than just the text. Here is how to inspect it:
# Full response inspection
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain AI agents in one sentence."}
]
)
# The response text
print(response.choices[0].message.content)
# Why the model stopped generating
print(response.choices[0].finish_reason) # "stop", "length", or "tool_calls"
# Token usage (important for cost tracking!)
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
# Model used
print(f"Model: {response.model}")
The finish_reason field is critical for agents. A value of "stop" means the model finished naturally. "length" means it hit the token limit (your response is truncated!). "tool_calls" means the model wants to call a tool — this is the foundation of agent behaviour covered in Module 6.
Controlling Output with Temperature and Max Tokens
Temperature controls randomness. For agents, you almost always want low temperature (deterministic, reliable output). Max tokens caps the response length.
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a precise data analyst."},
{"role": "user", "content": "What is 15% of 847?"}
],
temperature=0.0, # Deterministic output (best for agents)
max_tokens=100, # Limit response length
)
print(response.choices[0].message.content)
# Output: "15% of 847 is 127.05."
| Temperature | Behaviour | Best For |
|---|---|---|
| 0.0 | Deterministic — same input always gives same output | Agents, data extraction, calculations |
| 0.3–0.5 | Slightly creative but mostly consistent | Customer support, structured writing |
| 0.7–1.0 | Creative and varied | Brainstorming, creative writing, exploration |
| 1.5–2.0 | Highly random, often incoherent | Rarely useful in practice |
For agent applications, set temperature=0 during development and testing so results are reproducible. You can increase it later for specific creative tasks.
Multi-Turn Conversations
To have a back-and-forth conversation, you append both the assistant's response and the user's follow-up to the messages array:
messages = [
{"role": "system", "content": "You are a helpful maths tutor."},
{"role": "user", "content": "What is a derivative?"},
]
# First turn
response = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")
# Append assistant reply and user follow-up
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})
# Second turn - model has full context of the conversation
response = client.chat.completions.create(model="gpt-4o", messages=messages)
print(f"Assistant: {response.choices[0].message.content}")
The messages array is like a transcript of a conversation. The model has no memory between API calls — you must send the entire transcript every time. This is why the agent loop (Module 5) continuously appends to this array.
Anthropic — Your First Call (Claude Sonnet 4)
Anthropic's Messages API follows similar concepts but with a different syntax. The biggest difference is that the system prompt is a separate parameter, not part of the messages array.
Minimal Example
import anthropic
# Client automatically reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024, # Required (not optional!)
system="You are a helpful assistant.", # Separate from messages
messages=[
{"role": "user", "content": "What is an AI agent?"}
]
)
# Extract the response text
print(message.content[0].text)
Understanding the Anthropic Response
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Explain AI agents in one sentence."}]
)
# The response text (Anthropic returns a list of content blocks)
print(message.content[0].text)
# Stop reason
print(message.stop_reason) # "end_turn", "max_tokens", or "tool_use"
# Token usage
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")
# Model used
print(f"Model: {message.model}")
Anthropic's stop_reason uses different values than OpenAI. The equivalent of OpenAI's "stop" is "end_turn". The equivalent of "tool_calls" is "tool_use". Always check the provider's documentation for exact values.
Multi-Turn Conversations with Anthropic
messages = [
{"role": "user", "content": "What is a derivative in calculus?"},
]
# First turn
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful maths tutor.",
messages=messages
)
assistant_reply = response.content[0].text
# Append and continue
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful maths tutor.",
messages=messages
)
print(response.content[0].text)
Structured Output with Anthropic
For agent use cases, you often need the model to return structured data (JSON) rather than free-form text. Here is how to request and parse structured output:
import json
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="""You are a data extraction assistant.
Always respond with valid JSON. No other text.""",
messages=[{
"role": "user",
"content": "Extract the key details: 'Meeting with Sarah on Tuesday at 3pm to discuss Q4 budget. Location: Room 204.'"
}]
)
# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
# {
# "event": "Meeting",
# "attendee": "Sarah",
# "day": "Tuesday",
# "time": "3:00 PM",
# "topic": "Q4 budget",
# "location": "Room 204"
# }
Key API Differences
When building agents, you will often want to support multiple LLM providers (for cost optimisation, fallback, or capability differences). Here is a detailed comparison of the two APIs:
| Feature | OpenAI | Anthropic |
|---|---|---|
| System prompt | Inside messages array as role: "system" | Separate system parameter |
| Response text | response.choices[0].message.content | message.content[0].text |
| Max tokens | Optional (defaults to model max) | Required — must always specify |
| Stop reason | "stop", "length", "tool_calls" | "end_turn", "max_tokens", "tool_use" |
| Token usage | usage.prompt_tokens / usage.completion_tokens | usage.input_tokens / usage.output_tokens |
| Tool format | Wrapped in type: "function" | Direct input_schema at top level |
| Streaming | stream=True with SSE events | stream=True or client.messages.stream() |
Writing a Provider-Agnostic Wrapper
Since the concepts are identical but syntax differs, a common pattern is to write a thin wrapper that normalises the two APIs:
from openai import OpenAI
import anthropic
class LLMClient:
"""Unified wrapper for OpenAI and Anthropic APIs."""
def __init__(self, provider: str = "openai"):
self.provider = provider
if provider == "openai":
self.client = OpenAI()
self.model = "gpt-4o"
elif provider == "anthropic":
self.client = anthropic.Anthropic()
self.model = "claude-sonnet-4-20250514"
def chat(self, system: str, user_message: str,
temperature: float = 0.0, max_tokens: int = 1024) -> str:
"""Send a message and return the response text."""
if self.provider == "openai":
response = self.client.chat.completions.create(
model=self.model,
temperature=temperature,
max_tokens=max_tokens,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
elif self.provider == "anthropic":
message = self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user_message}],
)
return message.content[0].text
# Usage - switch providers with one parameter
llm = LLMClient(provider="anthropic")
answer = llm.chat(
system="You are a helpful assistant.",
user_message="What is an AI agent?"
)
print(answer)
Both APIs support streaming, tool use, vision (image input), and multi-turn conversation. The concepts are identical — only the syntax differs. Learning one makes it easy to use the other. In this tutorial series, we show both wherever possible so you can choose your preferred provider.
Cost Awareness
API calls cost money. Understanding token pricing is essential for building agents that do not break the bank:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Best For |
|---|---|---|---|
| GPT-4o mini | $0.15 | $0.60 | Simple tasks, classification, cheap testing |
| GPT-4o | $2.50 | $10.00 | Complex reasoning, agent core |
| Claude Haiku | $0.25 | $1.25 | Fast, simple tasks, high-volume |
| Claude Sonnet 4 | $3.00 | $15.00 | Complex reasoning, agent core |
An agent loop that runs 10 steps, each sending the full conversation history, can quickly consume thousands of tokens. Always monitor usage.total_tokens (or usage.input_tokens + usage.output_tokens) and set budget limits during development. A runaway agent loop can cost tens of dollars in minutes.