模块4: 第一次API调用
向OpenAI和Anthropic发起第一次API调用。
OpenAI — 第一次调用 (GPT-4o)
让我们从最基本的API调用开始。OpenAI Chat Completions API接收一个消息列表(每条消息包含角色和内容),并返回模型生成的响应。这是你将构建的每个智能体的基础。
最小示例
from openai import OpenAI
# Client automatically reads OPENAI_API_KEY from environment
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is an AI agent?"}
]
)
# Extract the response text
print(response.choices[0].message.content)
messages 数组是核心概念。它代表一段对话历史,包含三种可能的角色:
| 角色 | 用途 | 使用时机 |
|---|---|---|
system | 设定智能体的身份、规则和行为 | 一次,在messages数组的开头 |
user | 来自人类用户的消息 | 每次用户输入(问题、指令、后续跟进) |
assistant | 之前的模型响应(用于上下文) | 构建多轮对话时 |
理解响应对象
OpenAI的响应包含的不仅仅是文本。以下是如何检查它:
# Full response inspection
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain AI agents in one sentence."}
]
)
# The response text
print(response.choices[0].message.content)
# Why the model stopped generating
print(response.choices[0].finish_reason) # "stop", "length", or "tool_calls"
# Token usage (important for cost tracking!)
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
# Model used
print(f"Model: {response.model}")
finish_reason 字段对智能体至关重要。值为 "stop" 表示模型自然结束。"length" 表示达到了token限制(你的响应被截断了!)。"tool_calls" 表示模型想要调用工具——这是模块6中介绍的智能体行为的基础。
使用Temperature和Max Tokens控制输出
Temperature控制随机性。对于智能体,你几乎总是希望低temperature(确定性、可靠的输出)。Max tokens限制响应长度。
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a precise data analyst."},
{"role": "user", "content": "What is 15% of 847?"}
],
temperature=0.0, # Deterministic output (best for agents)
max_tokens=100, # Limit response length
)
print(response.choices[0].message.content)
# Output: "15% of 847 is 127.05."
| Temperature | 行为 | 最适合 |
|---|---|---|
| 0.0 | 确定性——相同输入总是给出相同输出 | 智能体、数据提取、计算 |
| 0.3–0.5 | 略有创意但大致一致 | 客户支持、结构化写作 |
| 0.7–1.0 | 创意性和多样性 | 头脑风暴、创意写作、探索 |
| 1.5–2.0 | 高度随机,通常不连贯 | 实践中很少有用 |
对于智能体应用,在开发和测试期间设置 temperature=0 以确保结果可复现。之后可以针对特定创意任务增加它。
多轮对话
要进行来回对话,你需要将助手的响应和用户的后续跟进都追加到messages数组中:
messages = [
{"role": "system", "content": "You are a helpful maths tutor."},
{"role": "user", "content": "What is a derivative?"},
]
# First turn
response = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")
# Append assistant reply and user follow-up
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})
# Second turn - model has full context of the conversation
response = client.chat.completions.create(model="gpt-4o", messages=messages)
print(f"Assistant: {response.choices[0].message.content}")
messages数组就像一份对话记录。模型在API调用之间没有记忆——你必须每次都发送完整的记录。这就是为什么智能体循环(模块5)会持续追加到这个数组。
Anthropic — 第一次调用 (Claude Sonnet 4)
Anthropic的Messages API遵循类似的概念,但语法不同。最大的区别是系统提示是一个独立参数,而不是messages数组的一部分。
最小示例
import anthropic
# Client automatically reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024, # Required (not optional!)
system="You are a helpful assistant.", # Separate from messages
messages=[
{"role": "user", "content": "What is an AI agent?"}
]
)
# Extract the response text
print(message.content[0].text)
理解Anthropic响应
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Explain AI agents in one sentence."}]
)
# The response text (Anthropic returns a list of content blocks)
print(message.content[0].text)
# Stop reason
print(message.stop_reason) # "end_turn", "max_tokens", or "tool_use"
# Token usage
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")
# Model used
print(f"Model: {message.model}")
Anthropic的 stop_reason 使用与OpenAI不同的值。OpenAI的 "stop" 对应的是 "end_turn"。"tool_calls" 对应的是 "tool_use"。请务必查看供应商的文档以获取确切值。
Anthropic的多轮对话
messages = [
{"role": "user", "content": "What is a derivative in calculus?"},
]
# First turn
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful maths tutor.",
messages=messages
)
assistant_reply = response.content[0].text
# Append and continue
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful maths tutor.",
messages=messages
)
print(response.content[0].text)
Anthropic的结构化输出
对于智能体用例,你经常需要模型返回结构化数据(JSON)而不是自由格式文本。以下是如何请求和解析结构化输出:
import json
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="""You are a data extraction assistant.
Always respond with valid JSON. No other text.""",
messages=[{
"role": "user",
"content": "Extract the key details: 'Meeting with Sarah on Tuesday at 3pm to discuss Q4 budget. Location: Room 204.'"
}]
)
# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
# {
# "event": "Meeting",
# "attendee": "Sarah",
# "day": "Tuesday",
# "time": "3:00 PM",
# "topic": "Q4 budget",
# "location": "Room 204"
# }
API关键差异
构建智能体时,你通常会希望支持多个LLM供应商(用于成本优化、故障转移或能力差异)。以下是两个API的详细对比:
| 特性 | OpenAI | Anthropic |
|---|---|---|
| 系统提示 | 在 messages 数组中作为 role: "system" | 独立的 system 参数 |
| 响应文本 | response.choices[0].message.content | message.content[0].text |
| 最大token | 可选(默认为模型最大值) | 必需——必须始终指定 |
| 停止原因 | "stop"、"length"、"tool_calls" | "end_turn"、"max_tokens"、"tool_use" |
| Token使用量 | usage.prompt_tokens / usage.completion_tokens | usage.input_tokens / usage.output_tokens |
| 工具格式 | 包裹在 type: "function" 中 | 直接在顶层使用 input_schema |
| 流式传输 | stream=True 配合SSE事件 | stream=True 或 client.messages.stream() |
编写供应商无关的包装器
由于概念相同但语法不同,常见的模式是编写一个薄包装器来统一两个API:
from openai import OpenAI
import anthropic
class LLMClient:
"""Unified wrapper for OpenAI and Anthropic APIs."""
def __init__(self, provider: str = "openai"):
self.provider = provider
if provider == "openai":
self.client = OpenAI()
self.model = "gpt-4o"
elif provider == "anthropic":
self.client = anthropic.Anthropic()
self.model = "claude-sonnet-4-20250514"
def chat(self, system: str, user_message: str,
temperature: float = 0.0, max_tokens: int = 1024) -> str:
"""Send a message and return the response text."""
if self.provider == "openai":
response = self.client.chat.completions.create(
model=self.model,
temperature=temperature,
max_tokens=max_tokens,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
elif self.provider == "anthropic":
message = self.client.messages.create(
model=self.model,
max_tokens=max_tokens,
system=system,
messages=[{"role": "user", "content": user_message}],
)
return message.content[0].text
# Usage - switch providers with one parameter
llm = LLMClient(provider="anthropic")
answer = llm.chat(
system="You are a helpful assistant.",
user_message="What is an AI agent?"
)
print(answer)
两个API都支持流式传输、工具使用、视觉(图像输入)和多轮对话。概念完全相同——只是语法不同。学会一个就能轻松使用另一个。在本教程系列中,我们尽可能展示两者,以便你选择偏好的供应商。
成本意识
API调用需要花钱。理解token定价对于构建不会让你破产的智能体至关重要:
| 模型 | 输入(每100万token) | 输出(每100万token) | 最适合 |
|---|---|---|---|
| GPT-4o mini | $0.15 | $0.60 | 简单任务、分类、廉价测试 |
| GPT-4o | $2.50 | $10.00 | 复杂推理、智能体核心 |
| Claude Haiku | $0.25 | $1.25 | 快速、简单任务、高吞吐量 |
| Claude Sonnet 4 | $3.00 | $15.00 | 复杂推理、智能体核心 |
一个运行10步的智能体循环,每步都发送完整的对话历史,可以迅速消耗数千个token。始终监控 usage.total_tokens(或 usage.input_tokens + usage.output_tokens)并在开发期间设置预算限制。失控的智能体循环可能在几分钟内花费数十美元。