Module 1: What is an AI Agent?
Understanding what AI agents are, how they differ from chatbots, and the core components that make them work.
Definition
An AI agent is a software system that uses a large language model (LLM) as its reasoning core, combined with tools, memory, and planning capabilities, to autonomously complete multi-step tasks on behalf of a user.
Unlike traditional software that follows rigid, pre-programmed rules, an AI agent can interpret ambiguous instructions, break complex goals into sub-tasks, decide which tools to use at each step, and adapt its strategy based on intermediate results. The LLM provides the "brain" — natural language understanding and generation — while the surrounding infrastructure provides the "body" — the ability to perceive, remember, and act on the world.
A more formal way to think about it: an AI agent is a system that receives a goal, operates within an environment, and takes actions over multiple steps to achieve that goal, using an LLM to decide what to do at each step.
Think of an AI agent like an executive assistant. You say "book me a flight to Tokyo next week, something under $800." The assistant checks your calendar, searches flight options, compares prices, and books the best one — all without you micromanaging each step. A chatbot, by contrast, is like a dictionary: you ask a question, you get an answer, end of interaction.
The term "agent" in AI has a long history dating back to the 1990s (e.g., Marvin Minsky's "Society of Mind"). What makes the current wave different is the use of LLMs as a general-purpose reasoning engine, which dramatically broadens the range of tasks an agent can handle without task-specific programming.
Key Properties of an AI Agent
Researchers have identified several properties that distinguish true agents from simpler AI systems. Not every agent needs all of these, but understanding them helps you design the right level of complexity for your use case:
- Autonomy — operates without step-by-step human guidance; can complete a multi-step task after receiving a single high-level instruction
- Reactivity — perceives its environment and responds to changes; for example, retrying a failed API call or adapting when search results are empty
- Proactivity — takes initiative to achieve goals, not just respond to prompts; may gather additional information before the user asks for it
- Tool use — extends its capabilities by calling external functions and APIs; this is the single most important differentiator from chatbots
- Persistence — maintains state across interactions through memory systems; can "remember" what happened in previous sessions
- Self-correction — when a step fails or produces unexpected results, the agent can recognise the error and try a different approach
In practice, the most impactful property is tool use. An LLM with tools can access real-time data, execute code, and take actions in the real world. Everything else (planning, memory, self-correction) enhances the quality of those tool-use decisions.
# A conceptual skeleton of an AI agent in Python
class AIAgent:
def __init__(self, llm, tools, memory):
self.llm = llm # The reasoning engine (GPT-4o, Claude, etc.)
self.tools = tools # Available actions (search, calculate, etc.)
self.memory = memory # Short-term and long-term storage
def run(self, goal: str, max_steps: int = 10) -> str:
"""Execute the agent loop until the goal is achieved."""
self.memory.add("user", goal)
for step in range(max_steps):
# THINK: Ask the LLM what to do next
thought = self.llm.think(self.memory.context())
if thought.wants_tool:
# ACT: Execute the requested tool
result = self.tools.execute(thought.tool_call)
# OBSERVE: Store the result for the next iteration
self.memory.add("assistant", thought.reasoning)
self.memory.add("tool", result)
else:
# DONE: Return the final answer
return thought.response
return "Max steps reached without a final answer."
This is pseudocode to illustrate the architecture. You will build a real, runnable version of this in Module 5 using the actual OpenAI and Anthropic APIs. The key takeaway is that the structure is surprisingly simple — a class with an LLM, tools, memory, and a loop.
What Makes an Agent Different from Traditional Software?
Traditional software is deterministic — given the same input, it always produces the same output. An AI agent is stochastic — it may take different paths to solve the same problem. This has profound implications for testing, debugging, and deployment (covered in Modules 14–15).
| Property | Traditional Software | AI Agent |
|---|---|---|
| Behaviour | Deterministic — follows exact code paths | Stochastic — LLM may reason differently each time |
| Error handling | Try/catch, pre-defined error codes | Self-correction, retry with different strategy |
| Testing | Unit tests with exact assertions | Evaluation suites with fuzzy matching |
| Debugging | Step through code line by line | Inspect conversation logs and reasoning traces |
| Scaling | More servers, load balancing | More agents, orchestration, cost management |
Agent vs Chatbot
The simplest way to understand AI agents is to contrast them with chatbots, which most people are already familiar with. A chatbot responds to a single prompt with a single response. An agent reasons across multiple steps, decides which tools to use, maintains context, and takes real-world actions.
When you ask ChatGPT "What's the weather in London?", it either knows the answer from training data (possibly outdated) or tells you it cannot access real-time data. When you ask an agent the same question, it recognises it needs current data, calls a weather API, gets the result, formats it, and returns a live answer. The difference is not intelligence — it is architecture.
Here is a concrete example. Suppose a user says: "Find the three cheapest flights from London to Tokyo next Tuesday, then email me a summary." A chatbot would generate a plausible-sounding but invented list. An agent would: (1) call a flight search API, (2) filter and sort results, (3) compose an email, and (4) send it through an email API — four distinct actions across multiple turns of reasoning.
| Aspect | Chatbot | Agent |
|---|---|---|
| Reasoning | Single turn — one prompt, one response | Multi-step — plans, executes, and iterates |
| Tools | None (text generation only) | Search, APIs, code execution, databases |
| Memory | Session only (lost on reload) | Short-term (conversation) + long-term (persistent) |
| Actions | Text output only | Real-world side effects (send email, write file, etc.) |
| Autonomy | Reactive — waits for each prompt | Goal-directed — pursues objectives independently |
| Error handling | None — user must retry | Self-correcting — retries with different strategy |
Not every application needs an agent. If your use case is single-turn Q&A (e.g., "summarise this document"), a simple LLM call is cheaper, faster, and more reliable. Agents shine when tasks require multiple steps, external data, or real-world actions.
The Spectrum of Autonomy
In practice, there is a spectrum between a pure chatbot and a fully autonomous agent. Most real-world systems fall somewhere in the middle:
| Level | Description | Example |
|---|---|---|
| Level 0: Chatbot | Single-turn text generation | Basic ChatGPT conversation |
| Level 1: Router | LLM classifies intent, routes to the right handler | Customer support triage |
| Level 2: Tool-user | LLM decides which tool to call in a single step | Weather lookup, calculator |
| Level 3: Agent | Multi-step reasoning with tool use and self-correction | Research assistant, coding agent |
| Level 4: Multi-agent | Multiple agents collaborating on a shared goal | Software development team simulation |
As you progress through this tutorial, you will build systems at each level — starting from simple API calls (Level 0) and working up to multi-agent orchestration (Level 4) by Module 11.
A Practical Example: Chatbot vs Agent
To make the distinction concrete, here is what happens when you ask both a chatbot and an agent the same question: "What are the top 3 trending Python repositories on GitHub right now?"
# CHATBOT APPROACH: Single LLM call, no tools
# The model can only use training data (potentially months old)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user",
"content": "What are the top 3 trending Python repos on GitHub?"}]
)
# Result: May list repos that WERE popular during training,
# but cannot know what's trending RIGHT NOW.
# AGENT APPROACH: LLM + tool use
# The agent recognises it needs live data and calls a tool
# Step 1: LLM decides to call search_github_trending(language="python")
# Step 2: Tool returns real-time data from GitHub's API
# Step 3: LLM formats the result into a helpful response
# Result: Accurate, real-time information
Core Components
Every AI agent, regardless of framework or implementation, is built from the same five core components. Understanding these is essential before writing any code.
LLM Core
The reasoning engine — GPT-4o, Claude Sonnet 4, Gemini, etc. This is the "brain" that understands instructions, generates plans, and produces responses.
Tools
Functions the agent can invoke: web search, calculators, API calls, code execution, database queries, file I/O. Tools are what give an agent its "hands."
Memory
Short-term (the current conversation context) and long-term (persistent vector stores, databases). Memory gives the agent continuity across interactions.
Planning
Decomposing complex goals into executable sub-tasks and determining the order of operations. This is the "strategic thinking" capability.
Orchestration
The control loop (observe-think-act) that ties everything together and manages the flow between components.
The LLM Core in Depth
The LLM is the most critical component. It acts as the decision-making centre that interprets user intent, generates plans, selects tools, and synthesises results. Modern agents typically use the most capable models available — GPT-4o, Claude Sonnet 4, or Gemini — because reasoning quality directly determines agent reliability.
The LLM receives the full conversation history (including tool results) as context and must decide at each step: should I call a tool, ask the user for clarification, or deliver a final answer? This decision-making is guided by the system prompt (covered in Module 3) and the model's inherent capabilities.
Tools: Extending the Agent's Reach
Without tools, an agent is just a chatbot with extra steps. Tools are what allow an agent to interact with the real world. Each tool is defined by a name, a description (so the LLM knows when to use it), and an input schema (so the LLM knows what parameters to provide).
# Example: defining a tool for the agent to use
weather_tool = {
"name": "get_weather",
"description": "Get the current weather for a given city. Use this when the user asks about weather conditions.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name, e.g. 'London'"},
"units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit"}
},
"required": ["city"]
}
}
Tool descriptions are part of your prompt engineering. A vague description like "gets weather" will lead to the LLM using the tool incorrectly. A clear description like "Get the current weather for a given city. Use this when the user asks about current weather conditions, temperature, or forecasts" dramatically improves tool selection accuracy.
Memory: Short-Term vs Long-Term
Short-term memory is the conversation history — the messages array you pass to the API on each call. It is limited by the model's context window (e.g., 128K tokens for GPT-4o, 200K for Claude). As the conversation grows, older messages may need to be summarised or dropped to stay within limits.
Long-term memory is external storage — a vector database, a JSON file, or a traditional database — that persists across sessions and allows the agent to "remember" past interactions. For example, an agent might store user preferences ("prefers metric units") in long-term memory so it does not need to ask every time.
Short-term memory is like the papers currently on your desk — you can see them and reference them, but desk space is limited. Long-term memory is like a filing cabinet — you can store much more, but you need to know what to look for and retrieve it explicitly.
Planning: Decomposing Goals
Planning is the ability to take a high-level goal like "prepare a competitive analysis report" and break it into actionable steps: (1) identify competitors, (2) gather financial data, (3) analyse market positioning, (4) draft the report. Advanced agents use techniques like chain-of-thought reasoning and ReAct (Reasoning + Acting) to plan more effectively.
Orchestration: The Control Loop
Orchestration is the "glue" that connects all the other components. It is the code that manages the agent loop — sending messages to the LLM, parsing responses, executing tool calls, handling errors, and deciding when the task is complete. You can think of it as the main event loop in a GUI application or the game loop in a video game.
In its simplest form, orchestration is a while loop with an if/else branch. In production systems, it may include retry logic, parallel tool execution, human approval gates, and cost tracking. We will build this step by step starting in Module 5.
# The orchestration loop in its simplest form
def agent_loop(user_goal, llm, tools, max_steps=10):
messages = [{"role": "user", "content": user_goal}]
for step in range(max_steps):
response = llm.generate(messages) # Think
if response.wants_tool: # Needs more info?
result = tools.run(response.tool) # Act
messages.append(response.as_message())
messages.append(result.as_message()) # Observe
else:
return response.text # Done!
return "Reached max steps without completing."
How the Components Work Together
Here is a walkthrough of all five components interacting in a real scenario. The user asks: "What was Apple's revenue last quarter?"
- Orchestration receives the user message and starts the loop
- LLM Core (Claude Sonnet 4) reasons: "I need financial data. I should use the search_sec_filings tool."
- Planning (implicit in the LLM's reasoning): decides to search for Apple's latest 10-Q filing
- Tools: the
search_sec_filingsfunction calls the SEC EDGAR API and returns the filing data - Memory: the tool result is added to the conversation history
- LLM Core reads the filing data, extracts the revenue figure, and formats a response
- Orchestration detects that the LLM is done (no more tool calls) and returns the final answer to the user
When building your first agent, start with just the LLM + one tool + the loop. Add memory and planning sophistication only when you need them. Premature complexity is the #1 reason agent projects fail.
The Agent Landscape (2025-2026)
The AI agent ecosystem has exploded since 2024. What was once a niche area of research has become the fastest-growing segment of the AI industry, with every major technology company investing heavily in agent infrastructure. The market is projected to reach $50B by 2030.
Two key protocols have emerged as industry standards for agent communication. MCP (Model Context Protocol), introduced by Anthropic, standardises how agents connect to external tools and data sources — think of it as "USB-C for AI agents." A2A (Agent-to-Agent), introduced by Google, enables agents built by different vendors to communicate and collaborate on tasks. Together, these protocols solve the interoperability problem that previously made it impossible for agents from different vendors to work together.
MCP is like a universal adapter for tools. Before MCP, every agent framework had its own way of connecting to tools — like having different charging cables for every phone. MCP standardises the connection so any tool built to the MCP spec works with any agent that supports MCP. A2A does the same thing, but for agent-to-agent communication rather than agent-to-tool communication.
The landscape can be broadly divided into three tiers:
| Tier | What It Includes | Examples |
|---|---|---|
| Foundation Models | The LLMs that power agent reasoning | GPT-4o, Claude Sonnet 4, Gemini 2.5 Pro |
| Agent Frameworks | Libraries and SDKs for building agents | LangChain, CrewAI, OpenAI Agents SDK, Claude Agent SDK |
| Agent Platforms | End-to-end hosted solutions for deploying agents | OpenAI Assistants API, Amazon Bedrock Agents, Google Vertex AI Agent Builder |
The agent ecosystem is evolving extremely fast. Frameworks and best practices that are standard today may be superseded within months. Focus on understanding the core concepts (LLM reasoning, tool use, memory, planning) rather than memorising specific API signatures — the concepts transfer across all frameworks.
Where Agents Are Used Today
- Customer support — agents that resolve tickets by searching knowledge bases, looking up orders, and taking actions like issuing refunds
- Software engineering — coding agents (Copilot, Cursor, Claude Code) that write, test, and debug code autonomously
- Data analysis — agents that query databases, run statistical analyses, and generate visualisations from natural language requests
- Research — agents that search the web, read papers, synthesise findings, and produce structured reports
- Workflow automation — agents that handle multi-step business processes like onboarding, procurement, or compliance checking
Key Industry Trends
Several trends are shaping the agent landscape right now:
- Standardised protocols — MCP and A2A are becoming the "HTTP of agents," enabling interoperability between different vendors and frameworks
- Specialised vs general agents — the industry is moving from "do everything" agents toward specialised agents that excel at one domain (coding, data analysis, customer support)
- Human-in-the-loop — most production agents include approval steps where a human reviews high-impact actions before execution
- Cost optimisation — techniques like model routing (use a cheap model for simple tasks, expensive model for complex ones) are becoming standard
- Evaluation frameworks — the industry is developing better ways to measure agent reliability, accuracy, and safety (covered in Module 14)
What This Tutorial Series Covers
This 15-module series takes you from zero to production-ready AI agents. Here is a roadmap of what you will learn:
Modules 1–5: Foundations
Core concepts, environment setup, prompt engineering, API calls, and the agent loop. By Module 5, you will have a working agent.
Modules 6–9: Capabilities
Tool use, memory management, RAG (retrieval-augmented generation), and MCP (Model Context Protocol) for connecting to external services.
Modules 10–12: Architecture
Agent frameworks (LangChain, CrewAI), multi-agent orchestration, and the A2A (Agent-to-Agent) communication protocol.
Modules 13–15: Production
Safety guardrails, testing and evaluation, deployment best practices, and cost optimisation strategies.
The best way to learn about agents is to build one. Starting in Module 2, we will set up your development environment and begin writing code. By Module 5, you will have a working agent that can reason, use tools, and maintain conversation context.
Prerequisites for This Series
To get the most from this tutorial, you should have:
- Basic Python knowledge — functions, classes, dictionaries, list comprehensions, and working with packages
- Comfort with the command line — navigating directories, running scripts, and installing packages with pip
- A willingness to experiment — agent development is iterative; expect to test, fail, and refine your approach
No prior experience with LLMs, APIs, or AI is required. We will cover everything from scratch.