模块5: 构建智能体循环
构建核心的观察-思考-行动循环。
观察 → 思考 → 行动循环
每个AI智能体的核心都是一个循环。智能体观察当前状态(用户输入、工具结果、记忆),通过将上下文发送给LLM进行推理来思考,然后通过调用工具或返回最终响应来行动。这个循环重复直到任务完成。
这个模式被称为ReAct循环(推理+行动),最初在Yao等人2022年的论文中描述。它是每个现代智能体框架使用的相同模式——LangChain、CrewAI、OpenAI Agents SDK和Claude的工具使用——无论其具体API如何。
将智能体循环想象成一位厨师准备复杂菜肴。厨师(1)观察食材和食谱的当前状态,(2)思考下一步该做什么("洋葱已经软了,该加大蒜了"),(3)行动执行下一步。每次行动后,厨师观察结果并决定下一步。当厨师判断不需要更多步骤时,菜就完成了。
以下是伪代码形式的流程:
# The universal agent loop pattern
messages = [user_input]
while True:
# THINK: Send context to LLM
response = llm.generate(messages)
# DECIDE: Does the LLM want to use a tool?
if response.wants_tool_call:
# ACT: Execute the tool
tool_result = execute_tool(response.tool_call)
# OBSERVE: Add result to context for next iteration
messages.append(response) # LLM's reasoning
messages.append(tool_result) # Tool's output
continue # Loop back to THINK
else:
# DONE: LLM has a final answer
return response.text
关键洞察是LLM自己决定何时停止。不是开发者硬编码"调用工具A,然后调用工具B,然后响应"。LLM根据任务和中间结果动态选择行动。这使得智能体具有灵活性——同一个循环可以处理"天气怎么样?"(一次工具调用)或"研究并总结前5名竞争对手"(多次工具调用和推理步骤)。
智能体循环在概念上很简单,但细节极其重要。你如何处理错误、如何管理不断增长的消息历史、以及如何防止无限循环,是区分玩具演示和生产智能体的关键。
API级别的循环
OpenAI和Anthropic都通过响应对象来信号工具使用。模式是相同的:
| 步骤 | OpenAI信号 | Anthropic信号 |
|---|---|---|
| LLM想要工具 | finish_reason == "tool_calls" | stop_reason == "tool_use" |
| LLM已完成 | finish_reason == "stop" | stop_reason == "end_turn" |
| 工具调用详情 | message.tool_calls[0] | content 块中 type == "tool_use" |
| 发送工具结果 | role: "tool" 消息 | role: "user" 中 type: "tool_result" |
最小智能体 — Anthropic
让我们使用Anthropic SDK构建一个最小但完整的智能体。这个智能体可以查询天气——一个简单的例子,但它展示了使用真实API调用的完整观察-思考-行动循环。
第1步:定义工具
首先,定义智能体可以使用的工具。每个工具需要名称、描述和输入模式。描述至关重要——它告诉LLM何时使用该工具:
import anthropic
client = anthropic.Anthropic()
# Define the tools available to the agent
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city. Use this when the user asks about weather, temperature, or conditions in a specific location.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London' or 'Tokyo'"
}
},
"required": ["city"]
}
}
]
第2步:实现工具
在真实应用中,此函数会调用天气API。目前我们使用一个桩函数:
def execute_tool(tool_name: str, tool_input: dict) -> str:
"""Execute a tool and return the result as a string."""
if tool_name == "get_weather":
city = tool_input["city"]
# In production, call a real weather API here
return f"Current weather in {city}: 22°C, partly cloudy, humidity 65%"
else:
return f"Error: Unknown tool '{tool_name}'"
第3步:智能体循环
现在是将一切联系在一起的核心循环:
def run_agent(user_message: str, max_steps: int = 10) -> str:
"""Run the agent loop until a final answer is produced."""
print(f"\n{'='*50}")
print(f"User: {user_message}")
print(f"{'='*50}")
messages = [{"role": "user", "content": user_message}]
for step in range(max_steps):
print(f"\n--- Step {step + 1} ---")
# THINK: Send context to the LLM
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant with access to tools.",
tools=tools,
messages=messages
)
print(f"Stop reason: {response.stop_reason}")
# CHECK: Does the LLM want to call a tool?
if response.stop_reason == "tool_use":
# Find the tool_use block in the response
tool_block = next(
b for b in response.content if b.type == "tool_use"
)
print(f"Tool call: {tool_block.name}({tool_block.input})")
# ACT: Execute the tool
result = execute_tool(tool_block.name, tool_block.input)
print(f"Tool result: {result}")
# OBSERVE: Add both the LLM response and tool result to history
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": result
}]
})
continue # Back to THINK
# DONE: LLM produced a final text response
final_text = next(
b.text for b in response.content if b.type == "text"
)
print(f"\nFinal answer: {final_text}")
return final_text
return "Error: Max steps reached without a final answer."
# Run it!
run_agent("What's the weather like in Tokyo?")
for step in range(max_steps) 循环是每个智能体的核心。它提供了防止无限循环的安全边界。LLM通过返回 stop_reason == "end_turn" 而非 "tool_use" 来决定何时停止。
预期输出
==================================================
User: What's the weather like in Tokyo?
==================================================
--- Step 1 ---
Stop reason: tool_use
Tool call: get_weather({'city': 'Tokyo'})
Tool result: Current weather in Tokyo: 22°C, partly cloudy, humidity 65%
--- Step 2 ---
Stop reason: end_turn
Final answer: The current weather in Tokyo is 22°C with partly cloudy skies
and 65% humidity.
最小智能体 — OpenAI版本
使用OpenAI API的相同模式,以便你比较语法差异:
from openai import OpenAI
import json
client = OpenAI()
# OpenAI tool format wraps each tool in a "function" type
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
def run_agent_openai(user_message: str, max_steps: int = 10) -> str:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_message}
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
tools=tools,
messages=messages
)
choice = response.choices[0]
if choice.finish_reason == "tool_calls":
tool_call = choice.message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
# OpenAI requires appending the assistant message first
messages.append(choice.message)
# Then the tool result with matching tool_call_id
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
continue
return choice.message.content
return "Max steps reached."
安全:防止失控的智能体
缺乏适当安全措施的智能体循环可能会失控——发起数十次API调用,消耗你的预算,并产生无意义的结果。安全机制不是可选的;它们是智能体设计的核心部分。
失控的智能体可以在几分钟内消耗你的全部API预算。使用GPT-4o的20步循环,每步发送不断增长的对话历史,可以轻松花费$5-$20每次运行。没有限制的话,导致无限循环的bug可能在你注意到之前花费数百美元。
基本安全措施
最大步数限制
始终限制循环迭代次数。从 max_steps=10 开始,根据用例调整。大多数任务在3-5步内完成。
Token预算
跟踪所有步骤的累计token使用量。如果总token超过阈值(如50,000 token),终止运行。
步骤日志
记录每一步:调用的工具、输入、结果和token计数。这对调试和成本分析至关重要。
超时
为整个智能体运行设置时间限制。如果智能体超过60秒,可能出了问题。
在代码中实现安全措施
import time
def run_safe_agent(user_message: str, max_steps: int = 10,
max_tokens: int = 50000, timeout: int = 60) -> str:
"""Agent loop with comprehensive safety measures."""
messages = [{"role": "user", "content": user_message}]
total_tokens = 0
start_time = time.time()
for step in range(max_steps):
# Safety check: timeout
elapsed = time.time() - start_time
if elapsed > timeout:
print(f"TIMEOUT: Agent exceeded {timeout}s limit at step {step + 1}")
return "Error: Agent timed out."
# THINK
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
tools=tools,
messages=messages
)
# Safety check: token budget
step_tokens = response.usage.input_tokens + response.usage.output_tokens
total_tokens += step_tokens
print(f"Step {step + 1}: {step_tokens} tokens (total: {total_tokens})")
if total_tokens > max_tokens:
print(f"BUDGET: Exceeded {max_tokens} token limit")
return "Error: Token budget exceeded."
# Normal loop logic continues...
if response.stop_reason == "tool_use":
tool_block = next(
b for b in response.content if b.type == "tool_use"
)
result = execute_tool(tool_block.name, tool_block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": result
}]
})
continue
final = next(b.text for b in response.content if b.type == "text")
print(f"\nCompleted in {step + 1} steps, {total_tokens} tokens, "
f"{time.time() - start_time:.1f}s")
return final
return "Error: Max steps reached."
检测无限循环
一种更微妙的失败模式是智能体不断用相同的输入调用相同的工具,得到相同的结果,却从不取得进展。你可以通过跟踪最近的工具调用来检测这种情况:
def detect_loop(messages: list, lookback: int = 4) -> bool:
"""Check if the agent is stuck calling the same tool repeatedly."""
recent_tool_calls = []
for msg in messages[-lookback * 2:]: # Check last N exchanges
if isinstance(msg.get("content"), list):
for block in msg["content"]:
if isinstance(block, dict) and block.get("type") == "tool_result":
recent_tool_calls.append(block.get("content", ""))
# If all recent tool results are identical, we're probably looping
if len(recent_tool_calls) >= 3 and len(set(recent_tool_calls)) == 1:
return True
return False
在生产智能体中,你还应该实现速率限制(如每分钟不超过30次API调用)、成本告警(如日支出超过$10时发送邮件通知)和终止开关(立即停止所有运行中的智能体的方法)。当你从开发扩展到生产使用时,这些运维关注点变得至关重要。