Module 5: Building the Agent Loop
Building the core observe-think-act loop.
The Observe → Think → Act Cycle
At its core, every AI agent is a loop. The agent observes the current state (user input, tool results, memory), thinks by sending that context to an LLM for reasoning, and acts by either calling a tool or returning a final response. This cycle repeats until the task is complete.
This pattern is known as the ReAct loop (Reasoning + Acting), first described in the 2022 paper by Yao et al. It is the same pattern used by every modern agent framework — LangChain, CrewAI, OpenAI Agents SDK, and Claude's tool use — regardless of their specific API.
Think of the agent loop like a chef preparing a complex dish. The chef (1) observes the current state of ingredients and the recipe, (2) thinks about what to do next ("the onions are soft enough, time to add garlic"), and (3) acts by performing the next step. After each action, the chef observes the result and decides the next move. The dish is done when the chef decides no more steps are needed.
Here is the flow in pseudocode:
# The universal agent loop pattern
messages = [user_input]
while True:
# THINK: Send context to LLM
response = llm.generate(messages)
# DECIDE: Does the LLM want to use a tool?
if response.wants_tool_call:
# ACT: Execute the tool
tool_result = execute_tool(response.tool_call)
# OBSERVE: Add result to context for next iteration
messages.append(response) # LLM's reasoning
messages.append(tool_result) # Tool's output
continue # Loop back to THINK
else:
# DONE: LLM has a final answer
return response.text
The key insight is that the LLM itself decides when to stop. It is not the developer hard-coding "call tool A, then tool B, then respond." The LLM dynamically chooses its actions based on the task and intermediate results. This is what makes agents flexible — the same loop can handle "What's the weather?" (one tool call) or "Research and summarise the top 5 competitors" (many tool calls and reasoning steps).
The agent loop is conceptually simple, but the details matter enormously. How you handle errors, how you manage the growing message history, and how you prevent infinite loops are what separate a toy demo from a production agent.
How the Loop Looks at the API Level
Both OpenAI and Anthropic signal tool use through their response objects. The pattern is the same:
| Step | OpenAI Signal | Anthropic Signal |
|---|---|---|
| LLM wants a tool | finish_reason == "tool_calls" | stop_reason == "tool_use" |
| LLM is done | finish_reason == "stop" | stop_reason == "end_turn" |
| Tool call details | message.tool_calls[0] | content block with type == "tool_use" |
| Sending tool result back | role: "tool" message | role: "user" with type: "tool_result" |
Minimal Agent — Anthropic
Let us build a minimal but complete agent using the Anthropic SDK. This agent can check the weather — a simple example, but it demonstrates the full observe-think-act loop with real API calls.
Step 1: Define Tools
First, define the tools the agent can use. Each tool needs a name, description, and input schema. The description is critical — it tells the LLM when to use the tool:
import anthropic
client = anthropic.Anthropic()
# Define the tools available to the agent
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city. Use this when the user asks about weather, temperature, or conditions in a specific location.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London' or 'Tokyo'"
}
},
"required": ["city"]
}
}
]
Step 2: Implement the Tool
In a real application, this function would call a weather API. For now, we use a stub:
def execute_tool(tool_name: str, tool_input: dict) -> str:
"""Execute a tool and return the result as a string."""
if tool_name == "get_weather":
city = tool_input["city"]
# In production, call a real weather API here
return f"Current weather in {city}: 22°C, partly cloudy, humidity 65%"
else:
return f"Error: Unknown tool '{tool_name}'"
Step 3: The Agent Loop
Now the core loop that ties everything together:
def run_agent(user_message: str, max_steps: int = 10) -> str:
"""Run the agent loop until a final answer is produced."""
print(f"\n{'='*50}")
print(f"User: {user_message}")
print(f"{'='*50}")
messages = [{"role": "user", "content": user_message}]
for step in range(max_steps):
print(f"\n--- Step {step + 1} ---")
# THINK: Send context to the LLM
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant with access to tools.",
tools=tools,
messages=messages
)
print(f"Stop reason: {response.stop_reason}")
# CHECK: Does the LLM want to call a tool?
if response.stop_reason == "tool_use":
# Find the tool_use block in the response
tool_block = next(
b for b in response.content if b.type == "tool_use"
)
print(f"Tool call: {tool_block.name}({tool_block.input})")
# ACT: Execute the tool
result = execute_tool(tool_block.name, tool_block.input)
print(f"Tool result: {result}")
# OBSERVE: Add both the LLM response and tool result to history
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": result
}]
})
continue # Back to THINK
# DONE: LLM produced a final text response
final_text = next(
b.text for b in response.content if b.type == "text"
)
print(f"\nFinal answer: {final_text}")
return final_text
return "Error: Max steps reached without a final answer."
# Run it!
run_agent("What's the weather like in Tokyo?")
The for step in range(max_steps) loop is the heart of every agent. It provides a safety bound that prevents infinite loops. The LLM decides when to stop by returning stop_reason == "end_turn" instead of "tool_use".
Expected Output
==================================================
User: What's the weather like in Tokyo?
==================================================
--- Step 1 ---
Stop reason: tool_use
Tool call: get_weather({'city': 'Tokyo'})
Tool result: Current weather in Tokyo: 22°C, partly cloudy, humidity 65%
--- Step 2 ---
Stop reason: end_turn
Final answer: The current weather in Tokyo is 22°C with partly cloudy skies
and 65% humidity.
Minimal Agent — OpenAI Version
The same pattern with the OpenAI API, so you can compare the syntax differences:
from openai import OpenAI
import json
client = OpenAI()
# OpenAI tool format wraps each tool in a "function" type
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
def run_agent_openai(user_message: str, max_steps: int = 10) -> str:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": user_message}
]
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
tools=tools,
messages=messages
)
choice = response.choices[0]
if choice.finish_reason == "tool_calls":
tool_call = choice.message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = execute_tool(tool_call.function.name, args)
# OpenAI requires appending the assistant message first
messages.append(choice.message)
# Then the tool result with matching tool_call_id
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
continue
return choice.message.content
return "Max steps reached."
Safety: Preventing Runaway Agents
An agent loop that lacks proper safeguards can spiral out of control — making dozens of API calls, consuming your budget, and producing nonsensical results. Safety mechanisms are not optional; they are a core part of agent design.
A runaway agent can consume your entire API budget in minutes. A 20-step loop with GPT-4o, each step sending a growing conversation history, can easily cost $5–$20 per run. Without limits, a bug that causes infinite looping could cost hundreds of dollars before you notice.
Essential Safety Measures
Max Step Limit
Always cap the number of loop iterations. Start with max_steps=10 and adjust based on your use case. Most tasks complete in 3–5 steps.
Token Budget
Track cumulative token usage across all steps. Abort the run if total tokens exceed a threshold (e.g., 50,000 tokens).
Step Logging
Log every step: the tool called, the input, the result, and the token count. This is essential for debugging and cost analysis.
Timeout
Set a wall-clock timeout for the entire agent run. If the agent takes longer than 60 seconds, something is probably wrong.
Implementing Safety in Code
import time
def run_safe_agent(user_message: str, max_steps: int = 10,
max_tokens: int = 50000, timeout: int = 60) -> str:
"""Agent loop with comprehensive safety measures."""
messages = [{"role": "user", "content": user_message}]
total_tokens = 0
start_time = time.time()
for step in range(max_steps):
# Safety check: timeout
elapsed = time.time() - start_time
if elapsed > timeout:
print(f"TIMEOUT: Agent exceeded {timeout}s limit at step {step + 1}")
return "Error: Agent timed out."
# THINK
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful assistant.",
tools=tools,
messages=messages
)
# Safety check: token budget
step_tokens = response.usage.input_tokens + response.usage.output_tokens
total_tokens += step_tokens
print(f"Step {step + 1}: {step_tokens} tokens (total: {total_tokens})")
if total_tokens > max_tokens:
print(f"BUDGET: Exceeded {max_tokens} token limit")
return "Error: Token budget exceeded."
# Normal loop logic continues...
if response.stop_reason == "tool_use":
tool_block = next(
b for b in response.content if b.type == "tool_use"
)
result = execute_tool(tool_block.name, tool_block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": result
}]
})
continue
final = next(b.text for b in response.content if b.type == "text")
print(f"\nCompleted in {step + 1} steps, {total_tokens} tokens, "
f"{time.time() - start_time:.1f}s")
return final
return "Error: Max steps reached."
Detecting Infinite Loops
A subtler failure mode is when the agent keeps calling the same tool with the same input, getting the same result, and never making progress. You can detect this by tracking recent tool calls:
def detect_loop(messages: list, lookback: int = 4) -> bool:
"""Check if the agent is stuck calling the same tool repeatedly."""
recent_tool_calls = []
for msg in messages[-lookback * 2:]: # Check last N exchanges
if isinstance(msg.get("content"), list):
for block in msg["content"]:
if isinstance(block, dict) and block.get("type") == "tool_result":
recent_tool_calls.append(block.get("content", ""))
# If all recent tool results are identical, we're probably looping
if len(recent_tool_calls) >= 3 and len(set(recent_tool_calls)) == 1:
return True
return False
In production agents, you should also implement rate limiting (e.g., no more than 30 API calls per minute), cost alerts (email notification if daily spend exceeds $10), and kill switches (a way to immediately halt all running agents). These operational concerns become critical as you scale from development to production use.