AI Agent Series — Ran Wei

Module 1: What is an AI Agent?

Understanding what AI agents are, how they differ from chatbots, and the core components that make them work.

Definition

An AI agent is a software system that uses a large language model (LLM) as its reasoning core, combined with tools, memory, and planning capabilities, to autonomously complete multi-step tasks on behalf of a user.

Unlike traditional software that follows rigid, pre-programmed rules, an AI agent can interpret ambiguous instructions, break complex goals into sub-tasks, decide which tools to use at each step, and adapt its strategy based on intermediate results. The LLM provides the "brain" — natural language understanding and generation — while the surrounding infrastructure provides the "body" — the ability to perceive, remember, and act on the world.

A more formal way to think about it: an AI agent is a system that receives a goal, operates within an environment, and takes actions over multiple steps to achieve that goal, using an LLM to decide what to do at each step.

ANALOGY

Think of an AI agent like an executive assistant. You say "book me a flight to Tokyo next week, something under $800." The assistant checks your calendar, searches flight options, compares prices, and books the best one — all without you micromanaging each step. A chatbot, by contrast, is like a dictionary: you ask a question, you get an answer, end of interaction.

NOTE

The term "agent" in AI has a long history dating back to the 1990s (e.g., Marvin Minsky's "Society of Mind"). What makes the current wave different is the use of LLMs as a general-purpose reasoning engine, which dramatically broadens the range of tasks an agent can handle without task-specific programming.

Key Properties of an AI Agent

Researchers have identified several properties that distinguish true agents from simpler AI systems. Not every agent needs all of these, but understanding them helps you design the right level of complexity for your use case:

Autonomy — operates without step-by-step human guidance; can complete a multi-step task after receiving a single high-level instruction
Reactivity — perceives its environment and responds to changes; for example, retrying a failed API call or adapting when search results are empty
Proactivity — takes initiative to achieve goals, not just respond to prompts; may gather additional information before the user asks for it
Tool use — extends its capabilities by calling external functions and APIs; this is the single most important differentiator from chatbots
Persistence — maintains state across interactions through memory systems; can "remember" what happened in previous sessions
Self-correction — when a step fails or produces unexpected results, the agent can recognise the error and try a different approach

NOTE

In practice, the most impactful property is tool use. An LLM with tools can access real-time data, execute code, and take actions in the real world. Everything else (planning, memory, self-correction) enhances the quality of those tool-use decisions.

# A conceptual skeleton of an AI agent in Python
class AIAgent:
    def __init__(self, llm, tools, memory):
        self.llm = llm          # The reasoning engine (GPT-4o, Claude, etc.)
        self.tools = tools      # Available actions (search, calculate, etc.)
        self.memory = memory    # Short-term and long-term storage

    def run(self, goal: str, max_steps: int = 10) -> str:
        """Execute the agent loop until the goal is achieved."""
        self.memory.add("user", goal)

        for step in range(max_steps):
            # THINK: Ask the LLM what to do next
            thought = self.llm.think(self.memory.context())

            if thought.wants_tool:
                # ACT: Execute the requested tool
                result = self.tools.execute(thought.tool_call)
                # OBSERVE: Store the result for the next iteration
                self.memory.add("assistant", thought.reasoning)
                self.memory.add("tool", result)
            else:
                # DONE: Return the final answer
                return thought.response

        return "Max steps reached without a final answer."

NOTE

This is pseudocode to illustrate the architecture. You will build a real, runnable version of this in Module 5 using the actual OpenAI and Anthropic APIs. The key takeaway is that the structure is surprisingly simple — a class with an LLM, tools, memory, and a loop.

What Makes an Agent Different from Traditional Software?

Traditional software is deterministic — given the same input, it always produces the same output. An AI agent is stochastic — it may take different paths to solve the same problem. This has profound implications for testing, debugging, and deployment (covered in Modules 14–15).

Property	Traditional Software	AI Agent
Behaviour	Deterministic — follows exact code paths	Stochastic — LLM may reason differently each time
Error handling	Try/catch, pre-defined error codes	Self-correction, retry with different strategy
Testing	Unit tests with exact assertions	Evaluation suites with fuzzy matching
Debugging	Step through code line by line	Inspect conversation logs and reasoning traces
Scaling	More servers, load balancing	More agents, orchestration, cost management

Agent vs Chatbot

The simplest way to understand AI agents is to contrast them with chatbots, which most people are already familiar with. A chatbot responds to a single prompt with a single response. An agent reasons across multiple steps, decides which tools to use, maintains context, and takes real-world actions.

When you ask ChatGPT "What's the weather in London?", it either knows the answer from training data (possibly outdated) or tells you it cannot access real-time data. When you ask an agent the same question, it recognises it needs current data, calls a weather API, gets the result, formats it, and returns a live answer. The difference is not intelligence — it is architecture.

Here is a concrete example. Suppose a user says: "Find the three cheapest flights from London to Tokyo next Tuesday, then email me a summary." A chatbot would generate a plausible-sounding but invented list. An agent would: (1) call a flight search API, (2) filter and sort results, (3) compose an email, and (4) send it through an email API — four distinct actions across multiple turns of reasoning.

Aspect	Chatbot	Agent
Reasoning	Single turn — one prompt, one response	Multi-step — plans, executes, and iterates
Tools	None (text generation only)	Search, APIs, code execution, databases
Memory	Session only (lost on reload)	Short-term (conversation) + long-term (persistent)
Actions	Text output only	Real-world side effects (send email, write file, etc.)
Autonomy	Reactive — waits for each prompt	Goal-directed — pursues objectives independently
Error handling	None — user must retry	Self-correcting — retries with different strategy

TIP

Not every application needs an agent. If your use case is single-turn Q&A (e.g., "summarise this document"), a simple LLM call is cheaper, faster, and more reliable. Agents shine when tasks require multiple steps, external data, or real-world actions.

The Spectrum of Autonomy

In practice, there is a spectrum between a pure chatbot and a fully autonomous agent. Most real-world systems fall somewhere in the middle:

Level	Description	Example
Level 0: Chatbot	Single-turn text generation	Basic ChatGPT conversation
Level 1: Router	LLM classifies intent, routes to the right handler	Customer support triage
Level 2: Tool-user	LLM decides which tool to call in a single step	Weather lookup, calculator
Level 3: Agent	Multi-step reasoning with tool use and self-correction	Research assistant, coding agent
Level 4: Multi-agent	Multiple agents collaborating on a shared goal	Software development team simulation

As you progress through this tutorial, you will build systems at each level — starting from simple API calls (Level 0) and working up to multi-agent orchestration (Level 4) by Module 11.

A Practical Example: Chatbot vs Agent

To make the distinction concrete, here is what happens when you ask both a chatbot and an agent the same question: "What are the top 3 trending Python repositories on GitHub right now?"

# CHATBOT APPROACH: Single LLM call, no tools
# The model can only use training data (potentially months old)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user",
               "content": "What are the top 3 trending Python repos on GitHub?"}]
)
# Result: May list repos that WERE popular during training,
# but cannot know what's trending RIGHT NOW.

# AGENT APPROACH: LLM + tool use
# The agent recognises it needs live data and calls a tool
# Step 1: LLM decides to call search_github_trending(language="python")
# Step 2: Tool returns real-time data from GitHub's API
# Step 3: LLM formats the result into a helpful response
# Result: Accurate, real-time information

Core Components

Every AI agent, regardless of framework or implementation, is built from the same five core components. Understanding these is essential before writing any code.

LLM Core

The reasoning engine — GPT-4o, Claude Sonnet 4, Gemini, etc. This is the "brain" that understands instructions, generates plans, and produces responses.

Tools

Functions the agent can invoke: web search, calculators, API calls, code execution, database queries, file I/O. Tools are what give an agent its "hands."

Memory

Short-term (the current conversation context) and long-term (persistent vector stores, databases). Memory gives the agent continuity across interactions.

Planning

Decomposing complex goals into executable sub-tasks and determining the order of operations. This is the "strategic thinking" capability.

Orchestration

The control loop (observe-think-act) that ties everything together and manages the flow between components.

The LLM Core in Depth

The LLM is the most critical component. It acts as the decision-making centre that interprets user intent, generates plans, selects tools, and synthesises results. Modern agents typically use the most capable models available — GPT-4o, Claude Sonnet 4, or Gemini — because reasoning quality directly determines agent reliability.

The LLM receives the full conversation history (including tool results) as context and must decide at each step: should I call a tool, ask the user for clarification, or deliver a final answer? This decision-making is guided by the system prompt (covered in Module 3) and the model's inherent capabilities.

Tools: Extending the Agent's Reach

Without tools, an agent is just a chatbot with extra steps. Tools are what allow an agent to interact with the real world. Each tool is defined by a name, a description (so the LLM knows when to use it), and an input schema (so the LLM knows what parameters to provide).

# Example: defining a tool for the agent to use
weather_tool = {
    "name": "get_weather",
    "description": "Get the current weather for a given city. Use this when the user asks about weather conditions.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "The city name, e.g. 'London'"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
        },
        "required": ["city"]
    }
}

NOTE

Tool descriptions are part of your prompt engineering. A vague description like "gets weather" will lead to the LLM using the tool incorrectly. A clear description like "Get the current weather for a given city. Use this when the user asks about current weather conditions, temperature, or forecasts" dramatically improves tool selection accuracy.

Memory: Short-Term vs Long-Term

Short-term memory is the conversation history — the messages array you pass to the API on each call. It is limited by the model's context window (e.g., 128K tokens for GPT-4o, 200K for Claude). As the conversation grows, older messages may need to be summarised or dropped to stay within limits.

Long-term memory is external storage — a vector database, a JSON file, or a traditional database — that persists across sessions and allows the agent to "remember" past interactions. For example, an agent might store user preferences ("prefers metric units") in long-term memory so it does not need to ask every time.

ANALOGY

Short-term memory is like the papers currently on your desk — you can see them and reference them, but desk space is limited. Long-term memory is like a filing cabinet — you can store much more, but you need to know what to look for and retrieve it explicitly.

Planning: Decomposing Goals

Planning is the ability to take a high-level goal like "prepare a competitive analysis report" and break it into actionable steps: (1) identify competitors, (2) gather financial data, (3) analyse market positioning, (4) draft the report. Advanced agents use techniques like chain-of-thought reasoning and ReAct (Reasoning + Acting) to plan more effectively.

Orchestration: The Control Loop

Orchestration is the "glue" that connects all the other components. It is the code that manages the agent loop — sending messages to the LLM, parsing responses, executing tool calls, handling errors, and deciding when the task is complete. You can think of it as the main event loop in a GUI application or the game loop in a video game.

In its simplest form, orchestration is a while loop with an if/else branch. In production systems, it may include retry logic, parallel tool execution, human approval gates, and cost tracking. We will build this step by step starting in Module 5.

# The orchestration loop in its simplest form
def agent_loop(user_goal, llm, tools, max_steps=10):
    messages = [{"role": "user", "content": user_goal}]

    for step in range(max_steps):
        response = llm.generate(messages)       # Think

        if response.wants_tool:                  # Needs more info?
            result = tools.run(response.tool)    # Act
            messages.append(response.as_message())
            messages.append(result.as_message()) # Observe
        else:
            return response.text                 # Done!

    return "Reached max steps without completing."

How the Components Work Together

Here is a walkthrough of all five components interacting in a real scenario. The user asks: "What was Apple's revenue last quarter?"

Orchestration receives the user message and starts the loop
LLM Core (Claude Sonnet 4) reasons: "I need financial data. I should use the search_sec_filings tool."
Planning (implicit in the LLM's reasoning): decides to search for Apple's latest 10-Q filing
Tools: the search_sec_filings function calls the SEC EDGAR API and returns the filing data
Memory: the tool result is added to the conversation history
LLM Core reads the filing data, extracts the revenue figure, and formats a response
Orchestration detects that the LLM is done (no more tool calls) and returns the final answer to the user

TIP

When building your first agent, start with just the LLM + one tool + the loop. Add memory and planning sophistication only when you need them. Premature complexity is the #1 reason agent projects fail.

The Agent Landscape (2025-2026)

The AI agent ecosystem has exploded since 2024. What was once a niche area of research has become the fastest-growing segment of the AI industry, with every major technology company investing heavily in agent infrastructure. The market is projected to reach $50B by 2030.

Two key protocols have emerged as industry standards for agent communication. MCP (Model Context Protocol), introduced by Anthropic, standardises how agents connect to external tools and data sources — think of it as "USB-C for AI agents." A2A (Agent-to-Agent), introduced by Google, enables agents built by different vendors to communicate and collaborate on tasks. Together, these protocols solve the interoperability problem that previously made it impossible for agents from different vendors to work together.

ANALOGY

MCP is like a universal adapter for tools. Before MCP, every agent framework had its own way of connecting to tools — like having different charging cables for every phone. MCP standardises the connection so any tool built to the MCP spec works with any agent that supports MCP. A2A does the same thing, but for agent-to-agent communication rather than agent-to-tool communication.

The landscape can be broadly divided into three tiers:

Tier	What It Includes	Examples
Foundation Models	The LLMs that power agent reasoning	GPT-4o, Claude Sonnet 4, Gemini 2.5 Pro
Agent Frameworks	Libraries and SDKs for building agents	LangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK
Agent Platforms	End-to-end hosted solutions for deploying agents	OpenAI Assistants API, Amazon Bedrock Agents, Google Vertex AI Agent Builder

NOTE

The agent ecosystem is evolving extremely fast. Frameworks and best practices that are standard today may be superseded within months. Focus on understanding the core concepts (LLM reasoning, tool use, memory, planning) rather than memorising specific API signatures — the concepts transfer across all frameworks.

Where Agents Are Used Today

Customer support — agents that resolve tickets by searching knowledge bases, looking up orders, and taking actions like issuing refunds
Software engineering — coding agents (Copilot, Cursor, Claude Code) that write, test, and debug code autonomously
Data analysis — agents that query databases, run statistical analyses, and generate visualisations from natural language requests
Research — agents that search the web, read papers, synthesise findings, and produce structured reports
Workflow automation — agents that handle multi-step business processes like onboarding, procurement, or compliance checking

Key Industry Trends

Several trends are shaping the agent landscape right now:

Standardised protocols — MCP and A2A are becoming the "HTTP of agents," enabling interoperability between different vendors and frameworks
Specialised vs general agents — the industry is moving from "do everything" agents toward specialised agents that excel at one domain (coding, data analysis, customer support)
Human-in-the-loop — most production agents include approval steps where a human reviews high-impact actions before execution
Cost optimisation — techniques like model routing (use a cheap model for simple tasks, expensive model for complex ones) are becoming standard
Evaluation frameworks — the industry is developing better ways to measure agent reliability, accuracy, and safety (covered in Module 14)

What This Tutorial Series Covers

This 15-module series takes you from zero to production-ready AI agents. Here is a roadmap of what you will learn:

Modules 1–5: Foundations

Core concepts, environment setup, prompt engineering, API calls, and the agent loop. By Module 5, you will have a working agent.

Modules 6–9: Capabilities

Tool use, memory management, RAG (retrieval-augmented generation), and MCP (Model Context Protocol) for connecting to external services.

Modules 10–12: Architecture

Agent frameworks (LangChain, CrewAI), multi-agent orchestration, and the A2A (Agent-to-Agent) communication protocol.

Modules 13–15: Production

Safety guardrails, testing and evaluation, deployment best practices, and cost optimisation strategies.

TIP

The best way to learn about agents is to build one. Starting in Module 2, we will set up your development environment and begin writing code. By Module 5, you will have a working agent that can reason, use tools, and maintain conversation context.

Prerequisites for This Series

To get the most from this tutorial, you should have:

Basic Python knowledge — functions, classes, dictionaries, list comprehensions, and working with packages
Comfort with the command line — navigating directories, running scripts, and installing packages with pip
A willingness to experiment — agent development is iterative; expect to test, fail, and refine your approach

No prior experience with LLMs, APIs, or AI is required. We will cover everything from scratch.

Up Next

Module 2 — Environment Setup & API Configuration