Ran Wei/ AI Agents/Module 1
中文
AI Agent Series — Ran Wei

Module 1: What is an AI Agent?

Understanding what AI agents are, how they differ from chatbots, and the core components that make them work.

1

Definition

An AI agent is a software system that uses a large language model (LLM) as its reasoning core, combined with tools, memory, and planning capabilities, to autonomously complete multi-step tasks on behalf of a user.

Unlike traditional software that follows rigid, pre-programmed rules, an AI agent can interpret ambiguous instructions, break complex goals into sub-tasks, decide which tools to use at each step, and adapt its strategy based on intermediate results. The LLM provides the "brain" — natural language understanding and generation — while the surrounding infrastructure provides the "body" — the ability to perceive, remember, and act on the world.

A more formal way to think about it: an AI agent is a system that receives a goal, operates within an environment, and takes actions over multiple steps to achieve that goal, using an LLM to decide what to do at each step.

ANALOGY

Think of an AI agent like an executive assistant. You say "book me a flight to Tokyo next week, something under $800." The assistant checks your calendar, searches flight options, compares prices, and books the best one — all without you micromanaging each step. A chatbot, by contrast, is like a dictionary: you ask a question, you get an answer, end of interaction.

NOTE

The term "agent" in AI has a long history dating back to the 1990s (e.g., Marvin Minsky's "Society of Mind"). What makes the current wave different is the use of LLMs as a general-purpose reasoning engine, which dramatically broadens the range of tasks an agent can handle without task-specific programming.

Key Properties of an AI Agent

Researchers have identified several properties that distinguish true agents from simpler AI systems. Not every agent needs all of these, but understanding them helps you design the right level of complexity for your use case:

NOTE

In practice, the most impactful property is tool use. An LLM with tools can access real-time data, execute code, and take actions in the real world. Everything else (planning, memory, self-correction) enhances the quality of those tool-use decisions.

# A conceptual skeleton of an AI agent in Python
class AIAgent:
    def __init__(self, llm, tools, memory):
        self.llm = llm          # The reasoning engine (GPT-4o, Claude, etc.)
        self.tools = tools      # Available actions (search, calculate, etc.)
        self.memory = memory    # Short-term and long-term storage

    def run(self, goal: str, max_steps: int = 10) -> str:
        """Execute the agent loop until the goal is achieved."""
        self.memory.add("user", goal)

        for step in range(max_steps):
            # THINK: Ask the LLM what to do next
            thought = self.llm.think(self.memory.context())

            if thought.wants_tool:
                # ACT: Execute the requested tool
                result = self.tools.execute(thought.tool_call)
                # OBSERVE: Store the result for the next iteration
                self.memory.add("assistant", thought.reasoning)
                self.memory.add("tool", result)
            else:
                # DONE: Return the final answer
                return thought.response

        return "Max steps reached without a final answer."
NOTE

This is pseudocode to illustrate the architecture. You will build a real, runnable version of this in Module 5 using the actual OpenAI and Anthropic APIs. The key takeaway is that the structure is surprisingly simple — a class with an LLM, tools, memory, and a loop.

What Makes an Agent Different from Traditional Software?

Traditional software is deterministic — given the same input, it always produces the same output. An AI agent is stochastic — it may take different paths to solve the same problem. This has profound implications for testing, debugging, and deployment (covered in Modules 14–15).

PropertyTraditional SoftwareAI Agent
BehaviourDeterministic — follows exact code pathsStochastic — LLM may reason differently each time
Error handlingTry/catch, pre-defined error codesSelf-correction, retry with different strategy
TestingUnit tests with exact assertionsEvaluation suites with fuzzy matching
DebuggingStep through code line by lineInspect conversation logs and reasoning traces
ScalingMore servers, load balancingMore agents, orchestration, cost management
2

Agent vs Chatbot

The simplest way to understand AI agents is to contrast them with chatbots, which most people are already familiar with. A chatbot responds to a single prompt with a single response. An agent reasons across multiple steps, decides which tools to use, maintains context, and takes real-world actions.

When you ask ChatGPT "What's the weather in London?", it either knows the answer from training data (possibly outdated) or tells you it cannot access real-time data. When you ask an agent the same question, it recognises it needs current data, calls a weather API, gets the result, formats it, and returns a live answer. The difference is not intelligence — it is architecture.

Here is a concrete example. Suppose a user says: "Find the three cheapest flights from London to Tokyo next Tuesday, then email me a summary." A chatbot would generate a plausible-sounding but invented list. An agent would: (1) call a flight search API, (2) filter and sort results, (3) compose an email, and (4) send it through an email API — four distinct actions across multiple turns of reasoning.

AspectChatbotAgent
ReasoningSingle turn — one prompt, one responseMulti-step — plans, executes, and iterates
ToolsNone (text generation only)Search, APIs, code execution, databases
MemorySession only (lost on reload)Short-term (conversation) + long-term (persistent)
ActionsText output onlyReal-world side effects (send email, write file, etc.)
AutonomyReactive — waits for each promptGoal-directed — pursues objectives independently
Error handlingNone — user must retrySelf-correcting — retries with different strategy
TIP

Not every application needs an agent. If your use case is single-turn Q&A (e.g., "summarise this document"), a simple LLM call is cheaper, faster, and more reliable. Agents shine when tasks require multiple steps, external data, or real-world actions.

The Spectrum of Autonomy

In practice, there is a spectrum between a pure chatbot and a fully autonomous agent. Most real-world systems fall somewhere in the middle:

LevelDescriptionExample
Level 0: ChatbotSingle-turn text generationBasic ChatGPT conversation
Level 1: RouterLLM classifies intent, routes to the right handlerCustomer support triage
Level 2: Tool-userLLM decides which tool to call in a single stepWeather lookup, calculator
Level 3: AgentMulti-step reasoning with tool use and self-correctionResearch assistant, coding agent
Level 4: Multi-agentMultiple agents collaborating on a shared goalSoftware development team simulation

As you progress through this tutorial, you will build systems at each level — starting from simple API calls (Level 0) and working up to multi-agent orchestration (Level 4) by Module 11.

A Practical Example: Chatbot vs Agent

To make the distinction concrete, here is what happens when you ask both a chatbot and an agent the same question: "What are the top 3 trending Python repositories on GitHub right now?"

# CHATBOT APPROACH: Single LLM call, no tools
# The model can only use training data (potentially months old)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "What are the top 3 trending Python repos on GitHub?"}]
)
# Result: May list repos that WERE popular during training,
# but cannot know what's trending RIGHT NOW.

# AGENT APPROACH: LLM + tool use
# The agent recognises it needs live data and calls a tool
# Step 1: LLM decides to call search_github_trending(language="python")
# Step 2: Tool returns real-time data from GitHub's API
# Step 3: LLM formats the result into a helpful response
# Result: Accurate, real-time information
3

Core Components

Every AI agent, regardless of framework or implementation, is built from the same five core components. Understanding these is essential before writing any code.

LLM Core

The reasoning engine — GPT-4o, Claude Sonnet 4, Gemini, etc. This is the "brain" that understands instructions, generates plans, and produces responses.

Tools

Functions the agent can invoke: web search, calculators, API calls, code execution, database queries, file I/O. Tools are what give an agent its "hands."

Memory

Short-term (the current conversation context) and long-term (persistent vector stores, databases). Memory gives the agent continuity across interactions.

Planning

Decomposing complex goals into executable sub-tasks and determining the order of operations. This is the "strategic thinking" capability.

Orchestration

The control loop (observe-think-act) that ties everything together and manages the flow between components.

The LLM Core in Depth

The LLM is the most critical component. It acts as the decision-making centre that interprets user intent, generates plans, selects tools, and synthesises results. Modern agents typically use the most capable models available — GPT-4o, Claude Sonnet 4, or Gemini — because reasoning quality directly determines agent reliability.

The LLM receives the full conversation history (including tool results) as context and must decide at each step: should I call a tool, ask the user for clarification, or deliver a final answer? This decision-making is guided by the system prompt (covered in Module 3) and the model's inherent capabilities.

Tools: Extending the Agent's Reach

Without tools, an agent is just a chatbot with extra steps. Tools are what allow an agent to interact with the real world. Each tool is defined by a name, a description (so the LLM knows when to use it), and an input schema (so the LLM knows what parameters to provide).

# Example: defining a tool for the agent to use
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather for a given city. Use this when the user asks about weather conditions.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "The city name, e.g. 'London'"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
        },
        "required": ["city"]
    }
}
NOTE

Tool descriptions are part of your prompt engineering. A vague description like "gets weather" will lead to the LLM using the tool incorrectly. A clear description like "Get the current weather for a given city. Use this when the user asks about current weather conditions, temperature, or forecasts" dramatically improves tool selection accuracy.

Memory: Short-Term vs Long-Term

Short-term memory is the conversation history — the messages array you pass to the API on each call. It is limited by the model's context window (e.g., 128K tokens for GPT-4o, 200K for Claude). As the conversation grows, older messages may need to be summarised or dropped to stay within limits.

Long-term memory is external storage — a vector database, a JSON file, or a traditional database — that persists across sessions and allows the agent to "remember" past interactions. For example, an agent might store user preferences ("prefers metric units") in long-term memory so it does not need to ask every time.

ANALOGY

Short-term memory is like the papers currently on your desk — you can see them and reference them, but desk space is limited. Long-term memory is like a filing cabinet — you can store much more, but you need to know what to look for and retrieve it explicitly.

Planning: Decomposing Goals

Planning is the ability to take a high-level goal like "prepare a competitive analysis report" and break it into actionable steps: (1) identify competitors, (2) gather financial data, (3) analyse market positioning, (4) draft the report. Advanced agents use techniques like chain-of-thought reasoning and ReAct (Reasoning + Acting) to plan more effectively.

Orchestration: The Control Loop

Orchestration is the "glue" that connects all the other components. It is the code that manages the agent loop — sending messages to the LLM, parsing responses, executing tool calls, handling errors, and deciding when the task is complete. You can think of it as the main event loop in a GUI application or the game loop in a video game.

In its simplest form, orchestration is a while loop with an if/else branch. In production systems, it may include retry logic, parallel tool execution, human approval gates, and cost tracking. We will build this step by step starting in Module 5.

# The orchestration loop in its simplest form
def agent_loop(user_goal, llm, tools, max_steps=10):
    messages = [{"role": "user", "content": user_goal}]

    for step in range(max_steps):
        response = llm.generate(messages)       # Think

        if response.wants_tool:                  # Needs more info?
            result = tools.run(response.tool)    # Act
            messages.append(response.as_message())
            messages.append(result.as_message()) # Observe
        else:
            return response.text                 # Done!

    return "Reached max steps without completing."

How the Components Work Together

Here is a walkthrough of all five components interacting in a real scenario. The user asks: "What was Apple's revenue last quarter?"

  1. Orchestration receives the user message and starts the loop
  2. LLM Core (Claude Sonnet 4) reasons: "I need financial data. I should use the search_sec_filings tool."
  3. Planning (implicit in the LLM's reasoning): decides to search for Apple's latest 10-Q filing
  4. Tools: the search_sec_filings function calls the SEC EDGAR API and returns the filing data
  5. Memory: the tool result is added to the conversation history
  6. LLM Core reads the filing data, extracts the revenue figure, and formats a response
  7. Orchestration detects that the LLM is done (no more tool calls) and returns the final answer to the user
TIP

When building your first agent, start with just the LLM + one tool + the loop. Add memory and planning sophistication only when you need them. Premature complexity is the #1 reason agent projects fail.

4

The Agent Landscape (2025-2026)

The AI agent ecosystem has exploded since 2024. What was once a niche area of research has become the fastest-growing segment of the AI industry, with every major technology company investing heavily in agent infrastructure. The market is projected to reach $50B by 2030.

Two key protocols have emerged as industry standards for agent communication. MCP (Model Context Protocol), introduced by Anthropic, standardises how agents connect to external tools and data sources — think of it as "USB-C for AI agents." A2A (Agent-to-Agent), introduced by Google, enables agents built by different vendors to communicate and collaborate on tasks. Together, these protocols solve the interoperability problem that previously made it impossible for agents from different vendors to work together.

ANALOGY

MCP is like a universal adapter for tools. Before MCP, every agent framework had its own way of connecting to tools — like having different charging cables for every phone. MCP standardises the connection so any tool built to the MCP spec works with any agent that supports MCP. A2A does the same thing, but for agent-to-agent communication rather than agent-to-tool communication.

The landscape can be broadly divided into three tiers:

TierWhat It IncludesExamples
Foundation ModelsThe LLMs that power agent reasoningGPT-4o, Claude Sonnet 4, Gemini 2.5 Pro
Agent FrameworksLibraries and SDKs for building agentsLangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK
Agent PlatformsEnd-to-end hosted solutions for deploying agentsOpenAI Assistants API, Amazon Bedrock Agents, Google Vertex AI Agent Builder
NOTE

The agent ecosystem is evolving extremely fast. Frameworks and best practices that are standard today may be superseded within months. Focus on understanding the core concepts (LLM reasoning, tool use, memory, planning) rather than memorising specific API signatures — the concepts transfer across all frameworks.

Where Agents Are Used Today

Key Industry Trends

Several trends are shaping the agent landscape right now:

What This Tutorial Series Covers

This 15-module series takes you from zero to production-ready AI agents. Here is a roadmap of what you will learn:

Modules 1–5: Foundations

Core concepts, environment setup, prompt engineering, API calls, and the agent loop. By Module 5, you will have a working agent.

Modules 6–9: Capabilities

Tool use, memory management, RAG (retrieval-augmented generation), and MCP (Model Context Protocol) for connecting to external services.

Modules 10–12: Architecture

Agent frameworks (LangChain, CrewAI), multi-agent orchestration, and the A2A (Agent-to-Agent) communication protocol.

Modules 13–15: Production

Safety guardrails, testing and evaluation, deployment best practices, and cost optimisation strategies.

TIP

The best way to learn about agents is to build one. Starting in Module 2, we will set up your development environment and begin writing code. By Module 5, you will have a working agent that can reason, use tools, and maintain conversation context.

Prerequisites for This Series

To get the most from this tutorial, you should have:

No prior experience with LLMs, APIs, or AI is required. We will cover everything from scratch.

Up Next

Module 2 — Environment Setup & API Configuration