Ran Wei/ AI智能体/模块4
EN
AI智能体系列 — Ran Wei

模块4: 第一次API调用

向OpenAI和Anthropic发起第一次API调用。

1

OpenAI — 第一次调用 (GPT-4o)

让我们从最基本的API调用开始。OpenAI Chat Completions API接收一个消息列表(每条消息包含角色和内容),并返回模型生成的响应。这是你将构建的每个智能体的基础。

最小示例

from openai import OpenAI

# Client automatically reads OPENAI_API_KEY from environment
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an AI agent?"}
    ]
)

# Extract the response text
print(response.choices[0].message.content)

messages 数组是核心概念。它代表一段对话历史,包含三种可能的角色:

角色用途使用时机
system设定智能体的身份、规则和行为一次,在messages数组的开头
user来自人类用户的消息每次用户输入(问题、指令、后续跟进)
assistant之前的模型响应(用于上下文)构建多轮对话时

理解响应对象

OpenAI的响应包含的不仅仅是文本。以下是如何检查它:

# Full response inspection
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain AI agents in one sentence."}
    ]
)

# The response text
print(response.choices[0].message.content)

# Why the model stopped generating
print(response.choices[0].finish_reason)  # "stop", "length", or "tool_calls"

# Token usage (important for cost tracking!)
print(f"Prompt tokens:     {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens:      {response.usage.total_tokens}")

# Model used
print(f"Model: {response.model}")
注意

finish_reason 字段对智能体至关重要。值为 "stop" 表示模型自然结束。"length" 表示达到了token限制(你的响应被截断了!)。"tool_calls" 表示模型想要调用工具——这是模块6中介绍的智能体行为的基础。

使用Temperature和Max Tokens控制输出

Temperature控制随机性。对于智能体,你几乎总是希望低temperature(确定性、可靠的输出)。Max tokens限制响应长度。

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a precise data analyst."},
        {"role": "user", "content": "What is 15% of 847?"}
    ],
    temperature=0.0,    # Deterministic output (best for agents)
    max_tokens=100,     # Limit response length
)
print(response.choices[0].message.content)
# Output: "15% of 847 is 127.05."
Temperature行为最适合
0.0确定性——相同输入总是给出相同输出智能体、数据提取、计算
0.3–0.5略有创意但大致一致客户支持、结构化写作
0.7–1.0创意性和多样性头脑风暴、创意写作、探索
1.5–2.0高度随机,通常不连贯实践中很少有用
提示

对于智能体应用,在开发和测试期间设置 temperature=0 以确保结果可复现。之后可以针对特定创意任务增加它。

多轮对话

要进行来回对话,你需要将助手的响应和用户的后续跟进都追加到messages数组中:

messages = [
    {"role": "system", "content": "You are a helpful maths tutor."},
    {"role": "user", "content": "What is a derivative?"},
]

# First turn
response = client.chat.completions.create(model="gpt-4o", messages=messages)
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}")

# Append assistant reply and user follow-up
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})

# Second turn - model has full context of the conversation
response = client.chat.completions.create(model="gpt-4o", messages=messages)
print(f"Assistant: {response.choices[0].message.content}")
类比

messages数组就像一份对话记录。模型在API调用之间没有记忆——你必须每次都发送完整的记录。这就是为什么智能体循环(模块5)会持续追加到这个数组。

2

Anthropic — 第一次调用 (Claude Sonnet 4)

Anthropic的Messages API遵循类似的概念,但语法不同。最大的区别是系统提示是一个独立参数,而不是messages数组的一部分。

最小示例

import anthropic

# Client automatically reads ANTHROPIC_API_KEY from environment
client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,                              # Required (not optional!)
    system="You are a helpful assistant.",         # Separate from messages
    messages=[
        {"role": "user", "content": "What is an AI agent?"}
    ]
)

# Extract the response text
print(message.content[0].text)

理解Anthropic响应

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain AI agents in one sentence."}]
)

# The response text (Anthropic returns a list of content blocks)
print(message.content[0].text)

# Stop reason
print(message.stop_reason)  # "end_turn", "max_tokens", or "tool_use"

# Token usage
print(f"Input tokens:  {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")

# Model used
print(f"Model: {message.model}")
注意

Anthropic的 stop_reason 使用与OpenAI不同的值。OpenAI的 "stop" 对应的是 "end_turn""tool_calls" 对应的是 "tool_use"。请务必查看供应商的文档以获取确切值。

Anthropic的多轮对话

messages = [
    {"role": "user", "content": "What is a derivative in calculus?"},
]

# First turn
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful maths tutor.",
    messages=messages
)
assistant_reply = response.content[0].text

# Append and continue
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "Can you give me a simple example?"})

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a helpful maths tutor.",
    messages=messages
)
print(response.content[0].text)

Anthropic的结构化输出

对于智能体用例,你经常需要模型返回结构化数据(JSON)而不是自由格式文本。以下是如何请求和解析结构化输出:

import json

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="""You are a data extraction assistant.
Always respond with valid JSON. No other text.""",
    messages=[{
        "role": "user",
        "content": "Extract the key details: 'Meeting with Sarah on Tuesday at 3pm to discuss Q4 budget. Location: Room 204.'"
    }]
)

# Parse the JSON response
data = json.loads(message.content[0].text)
print(json.dumps(data, indent=2))
# {
#   "event": "Meeting",
#   "attendee": "Sarah",
#   "day": "Tuesday",
#   "time": "3:00 PM",
#   "topic": "Q4 budget",
#   "location": "Room 204"
# }
3

API关键差异

构建智能体时,你通常会希望支持多个LLM供应商(用于成本优化、故障转移或能力差异)。以下是两个API的详细对比:

特性OpenAIAnthropic
系统提示messages 数组中作为 role: "system"独立的 system 参数
响应文本response.choices[0].message.contentmessage.content[0].text
最大token可选(默认为模型最大值)必需——必须始终指定
停止原因"stop""length""tool_calls""end_turn""max_tokens""tool_use"
Token使用量usage.prompt_tokens / usage.completion_tokensusage.input_tokens / usage.output_tokens
工具格式包裹在 type: "function"直接在顶层使用 input_schema
流式传输stream=True 配合SSE事件stream=Trueclient.messages.stream()

编写供应商无关的包装器

由于概念相同但语法不同,常见的模式是编写一个薄包装器来统一两个API:

from openai import OpenAI
import anthropic

class LLMClient:
    """Unified wrapper for OpenAI and Anthropic APIs."""

    def __init__(self, provider: str = "openai"):
        self.provider = provider
        if provider == "openai":
            self.client = OpenAI()
            self.model = "gpt-4o"
        elif provider == "anthropic":
            self.client = anthropic.Anthropic()
            self.model = "claude-sonnet-4-20250514"

    def chat(self, system: str, user_message: str,
             temperature: float = 0.0, max_tokens: int = 1024) -> str:
        """Send a message and return the response text."""
        if self.provider == "openai":
            response = self.client.chat.completions.create(
                model=self.model,
                temperature=temperature,
                max_tokens=max_tokens,
                messages=[
                    {"role": "system", "content": system},
                    {"role": "user", "content": user_message},
                ],
            )
            return response.choices[0].message.content

        elif self.provider == "anthropic":
            message = self.client.messages.create(
                model=self.model,
                max_tokens=max_tokens,
                system=system,
                messages=[{"role": "user", "content": user_message}],
            )
            return message.content[0].text

# Usage - switch providers with one parameter
llm = LLMClient(provider="anthropic")
answer = llm.chat(
    system="You are a helpful assistant.",
    user_message="What is an AI agent?"
)
print(answer)
提示

两个API都支持流式传输、工具使用、视觉(图像输入)和多轮对话。概念完全相同——只是语法不同。学会一个就能轻松使用另一个。在本教程系列中,我们尽可能展示两者,以便你选择偏好的供应商。

成本意识

API调用需要花钱。理解token定价对于构建不会让你破产的智能体至关重要:

模型输入(每100万token)输出(每100万token)最适合
GPT-4o mini$0.15$0.60简单任务、分类、廉价测试
GPT-4o$2.50$10.00复杂推理、智能体核心
Claude Haiku$0.25$1.25快速、简单任务、高吞吐量
Claude Sonnet 4$3.00$15.00复杂推理、智能体核心
陷阱

一个运行10步的智能体循环,每步都发送完整的对话历史,可以迅速消耗数千个token。始终监控 usage.total_tokens(或 usage.input_tokens + usage.output_tokens)并在开发期间设置预算限制。失控的智能体循环可能在几分钟内花费数十美元。

下一模块

模块5 — 构建智能体循环