Volume 3 — AI Agents & MCP

A working reference for the transition from Senior iOS Developer to AI Engineer


How to use this volume

Same format as Volumes 1 and 2: real explanations, diagrams, working code, two interview Q&As, and an exercise per chapter. This volume covers what happens once an LLM stops just answering and starts acting — and the protocol (MCP) that's become the standard way to wire those actions up to real systems.

Contents 1. What Is an Agent? Chatbot vs. Agent 2. Tool Calling Mechanics 3. The ReAct Pattern: Reason, Act, Observe 4. Agent Memory and State 5. Planning and Task Decomposition 6. Orchestration: State Machines and Graph-Based Agents 7. Multi-Agent Systems 8. What Is MCP and Why It Matters 9. MCP Architecture: Hosts, Clients, Servers 10. Building a Minimal MCP Server 11. Agent Safety: Guardrails, Confirmation, and Idempotency 12. Enterprise Agent Integrations 13. Hands-On: A Minimal Tool-Calling Agent for SmartStore AI

Appendix A — Glossary Appendix B — Chapter Summary Table


Chapter 1 — What Is an Agent? Chatbot vs. Agent

A chatbot, even a RAG-powered one (Volume 2), follows a fixed shape: input comes in, the model generates a text response, done. It never decides to do anything other than respond — it can't check inventory, send an email, or look something up that wasn't already retrieved for it ahead of time.

An agent breaks that fixed shape open. Instead of "always respond with text," the model is given a set of tools it can choose to invoke, and it decides — based on the actual request — whether to respond directly, call a tool and use the result, or call several tools in sequence before finally responding.

Chatbot:                              Agent:
User message                          User message
     │                                     │
     ▼                                     ▼
   LLM ──→ Text response               LLM decides: respond directly,
   (always exactly one shape)               OR call a tool
                                             │
                                             ▼
                                       Tool executes (real code,
                                       real API call, real side effect)
                                             │
                                             ▼
                                       Result fed back to LLM
                                             │
                                             ▼
                                       LLM decides again: respond,
                                       or call another tool
                                       (loop continues until done)

The defining trait of an agent isn't "it's smarter" — it's that the control flow is no longer fixed by your code; the model itself decides which step happens next, within boundaries you define (which tools exist, what they're allowed to do). That shift from "you wrote the logic" to "the model chooses among the logic you exposed" is what every other chapter in this volume builds on.

Interview Q&A

Q: SmartStore AI's "where's the olive oil" feature, as built in Volume 2, retrieves data and answers in one fixed flow. Is that an agent? A: No — it's a RAG-powered chatbot. Retrieval always happens, in a fixed order, regardless of the question; there's no point where the model is choosing among multiple possible actions. It becomes agentic the moment the model itself starts deciding, per-question, whether to look up a product, check store hours, or do something else entirely — i.e., when the next step is the model's choice, not your code's fixed sequence.

Q: Why would you choose to build a chatbot instead of an agent for a given feature, even though agents are more "advanced"? A: Agents add real complexity — non-deterministic control flow, more failure modes, and (Chapter 11) higher-stakes mistakes when the model is the one deciding to act. A fixed pipeline is more predictable, easier to test, easier to debug, and is the right choice whenever the task genuinely doesn't need the model making a decision about what to do next, only about what to say.

Exercise: List two SmartStore AI features that are well-served as a plain chatbot/RAG flow, and two that would genuinely benefit from agentic decision-making (the model choosing among multiple possible actions per request).


Chapter 2 — Tool Calling Mechanics

"Tool calling" (also called function calling) is the actual mechanism that makes Chapter 1's loop possible. You describe each available tool to the model — its name, what it does, and what arguments it takes — and the model can respond by asking to invoke one, instead of (or before) responding in plain text.

Your code defines:                    Model's response can be:
{
  "name": "get_product_location",     { "type": "tool_use",
  "description": "Find which aisle      "name": "get_product_location",
   a product is in, for a given          "input": {
   store",                                 "product_name": "olive oil",
  "input_schema": {                        "store_id": "store_123"
    "type": "object",                    }
    "properties": {                    }
      "product_name": {"type": "string"},
      "store_id": {"type": "string"}
    },
    "required": ["product_name", "store_id"]
  }
}

The model never actually executes anything — it only ever requests a tool call with specific arguments. Your application code is responsible for actually running the function, then sending the result back to the model as a tool_result, so the model can incorporate it into its next response.

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_product_location",
        "description": "Find which aisle a product is in, for a given store.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_name": {"type": "string"},
                "store_id": {"type": "string"},
            },
            "required": ["product_name", "store_id"],
        },
    }
]

def get_product_location(product_name: str, store_id: str) -> str:
    # This calls the retrieve() function built in Volume 2, Chapter 13
    from query import retrieve
    results = retrieve(product_name, store_id)
    return results[0] if results else "Not found"

messages = [{"role": "user", "content": "Where's the olive oil at store_123?"}]

response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=500, tools=tools, messages=messages
)

# Check if the model asked to call a tool
for block in response.content:
    if block.type == "tool_use":
        result = get_product_location(**block.input)
        messages.append({"role": "assistant", "content": response.content})
        messages.append({
            "role": "user",
            "content": [{"type": "tool_result", "tool_use_id": block.id, "content": result}],
        })

# Send the follow-up so the model can produce a final answer using the tool result
final = client.messages.create(model="claude-sonnet-4-6", max_tokens=500, tools=tools, messages=messages)

The description field matters more than it looks — it's the only information the model has about when and how to use a given tool. A vague description leads to the model either never using a useful tool or misusing it; this is prompt engineering applied to tool definitions, not just to system prompts.

Interview Q&A

Q: The model requests a tool call with product_name: "olve oil" (a typo) for a product that's actually stored as "olive oil." What's the right place to handle this — the model, or your code? A: Your retrieval code — this is exactly why Volume 2's retrieval uses semantic vector search rather than exact string matching; a typo or paraphrase shouldn't break the lookup. Tool calling doesn't replace the need for robust underlying logic; it just routes the model's intent to it.

Q: Why does the model only ever "request" a tool call rather than execute it directly? A: Security and control. If the model could directly execute arbitrary code or API calls, you'd have no opportunity to validate arguments, enforce permissions, log the action, or stop something destructive before it happens. Keeping execution in your application code is what makes Chapter 11's guardrails possible at all.

Exercise: Write the input_schema for a second tool, check_store_hours(store_id), that the model could call alongside get_product_location.


Chapter 3 — The ReAct Pattern: Reason, Act, Observe

ReAct (Reason + Act) is the conceptual loop underneath most tool-calling agents, even when the framework hides it from you. At each step, the model produces a thought (its reasoning about what to do next), takes an action (a tool call), receives an observation (the tool's result), and repeats — until it decides it has enough information to give a final answer.

Thought: I need to know which aisle has olive oil at store_123.
Action: get_product_location(product_name="olive oil", store_id="store_123")
Observation: "Product: Extra Virgin Olive Oil 500ml. Aisle: 7."
Thought: I have the answer now.
Final Answer: The olive oil is in aisle 7.

For a multi-step task, the loop runs more than once:

Thought: User wants to know if olive oil is in stock AND what aisle it's in.
Action: get_product_location(...)
Observation: "...Aisle: 7..."
Thought: I have the aisle. Now I need stock status — that's a different tool.
Action: check_stock(...)
Observation: "In stock: 12 units"
Thought: I now have both pieces of information.
Final Answer: Olive oil is in stock (12 units) in aisle 7.

This is exactly the loop the code in Chapter 2 implements — check the response for a tool-use request, execute it, feed the result back, and repeat until the model stops requesting tools and produces a final text answer. In code, this means wrapping the "call model → check for tool_use → execute → append result → call model again" sequence in an actual loop rather than handling it once.

def run_agent(user_message: str, tools: list, tool_functions: dict, max_steps: int = 5) -> str:
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6", max_tokens=500, tools=tools, messages=messages
        )
        tool_calls = [b for b in response.content if b.type == "tool_use"]

        if not tool_calls:
            return "".join(b.text for b in response.content if b.type == "text")

        messages.append({"role": "assistant", "content": response.content})
        results = []
        for call in tool_calls:
            output = tool_functions[call.name](**call.input)
            results.append({"type": "tool_result", "tool_use_id": call.id, "content": output})
        messages.append({"role": "user", "content": results})

    return "I wasn't able to complete this within the allowed steps."

The max_steps cap isn't a minor detail — without it, a model stuck in a reasoning loop (or facing a task it can't actually complete with the tools it has) will keep calling tools indefinitely, burning cost and latency with no end condition.

Interview Q&A

Q: Why is an explicit step limit (max_steps) a necessary part of any real ReAct loop implementation, not just a nice-to-have? A: Without it, there's no guaranteed termination condition besides the model voluntarily deciding to stop. If a tool returns an unexpected/ambiguous result, or the task genuinely can't be completed with available tools, the model can keep requesting more tool calls indefinitely — a runaway loop that costs money and time with no bound until something external stops it.

Q: How is ReAct different from simply giving the model several tools and one chance to call one of them? A: ReAct is specifically about iterating — the model observes the result of one action before deciding the next one, which allows genuinely multi-step tasks (gather A, then use A to decide whether B is needed) rather than a single decision made with incomplete information. A single tool-call-then-respond pattern can't adapt its later steps based on what an earlier tool actually returned.

Exercise: Trace through the ReAct loop, on paper, for the question "Is olive oil cheaper at store_123 or store_456?" — what thoughts, actions, and observations would a correctly-functioning agent need, assuming a get_price(product_name, store_id) tool exists?


Chapter 4 — Agent Memory and State

"Memory" for an agent splits into two genuinely different things, often conflated:

  • Short-term / working memory — the current conversation and the results of tool calls made during this session. This is just the messages list from Chapters 2-3, held in memory for the duration of one interaction.
  • Long-term memory — information that needs to persist across sessions: a user's past preferences, previously completed tasks, facts learned in an earlier conversation that should still apply today.
Short-term (per session, in-process):       Long-term (persisted, cross-session):
messages = [...]                            PostgreSQL / Redis
  - lives only as long as this                - "this user prefers store_123"
    request/conversation                      - "last completed task: ..."
  - lost when the process/session ends        - survives across sessions, deploys,
                                                 even server restarts

For SmartStore AI's existing stack, this maps directly: short-term state is just the in-memory message list during one chat session; long-term memory is exactly what Redis (fast, ephemeral-ish key-value — good for session state and caching) and PostgreSQL (durable, queryable — good for genuinely persistent user data) are already there for in your architecture.

import redis
import json

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def save_session_state(session_id: str, messages: list, ttl_seconds: int = 3600):
    r.set(f"session:{session_id}", json.dumps(messages), ex=ttl_seconds)

def load_session_state(session_id: str) -> list:
    raw = r.get(f"session:{session_id}")
    return json.loads(raw) if raw else []

# Long-term memory: a simple preference, persisted in Postgres (conceptual)
def save_user_preference(user_id: str, key: str, value: str):
    db.execute(
        "INSERT INTO user_preferences (user_id, key, value) VALUES (%s, %s, %s) "
        "ON CONFLICT (user_id, key) DO UPDATE SET value = %s",
        (user_id, key, value, value),
    )

A subtlety worth internalizing: an agent's "memory" of tool results within a single ReAct loop (Chapter 3) is not the same as long-term memory — it's just the growing messages list, which gets discarded at the end of the session unless you explicitly persist something from it.

Interview Q&A

Q: A user tells SmartStore AI's assistant "I always shop at store_123" in one conversation, and expects it to remember that next week. What needs to change in the architecture to support this? A: This requires long-term memory — explicitly extracting and persisting that preference (e.g., into PostgreSQL keyed by user ID) at the point it's stated, and explicitly loading it back into context (e.g., into the system prompt) at the start of future sessions. Without that explicit persistence step, it only exists in the short-term messages list and disappears once the session ends.

Q: Why use Redis for session-level state rather than just keeping it in a Python variable in your FastAPI process? A: A variable in process memory is lost on restart/deploy and doesn't work if you're running multiple backend instances behind a load balancer — a user's next request might hit a different instance with no idea about their session. Redis gives you a shared, fast store any instance can read/write, which is necessary the moment your backend isn't a single guaranteed process.

Exercise: Design (in plain text) what fields you'd store for a SmartStore AI long-term user memory: what's worth persisting (e.g., preferred store, dietary restrictions for product suggestions) and what's better left as short-term, per-session-only context.


Chapter 5 — Planning and Task Decomposition

Some requests are answerable in one tool call. Others genuinely require breaking a goal into ordered subtasks before any action happens — this is planning, and it's a distinct step from the moment-to-moment ReAct loop (Chapter 3).

Goal: "Tell me if I can get everything for a pasta dinner at store_123
       tonight, and what aisles they're all in."

Naive ReAct without planning:           With explicit planning first:
Model jumps straight into calling       Model first decomposes:
tools one at a time, possibly             1. Identify needed ingredients
missing ingredients or re-querying        2. Look up each ingredient's
inefficiently                                location + stock, one by one
                                           3. Combine into a single answer
                                         ...then executes that plan via
                                         the ReAct loop, one subtask at a time

The practical version of this in code is often nothing fancier than an explicit planning prompt before the action loop begins:

def make_plan(goal: str) -> list[str]:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=300,
        system=(
            "Break the user's goal into an ordered list of concrete subtasks. "
            "Respond with ONLY a JSON array of strings, no other text."
        ),
        messages=[{"role": "user", "content": goal}],
    )
    text = "".join(b.text for b in response.content if b.type == "text")
    import json
    return json.loads(text)

# Example output for the pasta-dinner goal:
# ["Identify ingredients for a basic pasta dinner", "Look up location and
#  stock for each ingredient at store_123", "Summarize findings into one answer"]

Each subtask can then be handed, one at a time, to the ReAct/tool-calling loop from Chapter 3. The key benefit isn't that the model becomes smarter — it's that decomposition makes the process inspectable and debuggable: if something goes wrong, you can see exactly which subtask failed, rather than untangling one long, opaque reasoning chain.

Interview Q&A

Q: Why not just let the model figure out the steps implicitly inside one long ReAct loop, instead of generating an explicit plan first? A: Implicit step-by-step reasoning inside a single loop works for simple cases, but for genuinely multi-part goals, an explicit plan gives you a checkpoint to validate, log, and debug against — you can see exactly what the model intends to do before any tool executes, which matters a lot once tools have real side effects (Chapter 11), and it makes failures attributable to a specific subtask rather than an opaque chain of tool calls.

Q: When would explicit planning be unnecessary overhead? A: For genuinely single-step requests (Chapter 1's "well-served as a plain chatbot/RAG flow" cases, or simple one-tool lookups) — adding a planning step before a single tool call just adds latency and cost for no benefit. Planning earns its place once a goal demonstrably requires multiple, ordered, interdependent actions.

Exercise: Write out, in plain English, the plan steps a SmartStore AI agent should generate for the goal "help me find a substitute for an ingredient that's out of stock."


Chapter 6 — Orchestration: State Machines and Graph-Based Agents

A single ReAct loop works well for "keep calling tools until done." It starts to strain once your workflow has genuine branching logic — different paths depending on intermediate results, steps that should only run under certain conditions, or stages that need to hand off to a different specialized prompt/model entirely.

Graph-based orchestration (popularized by frameworks like LangGraph) models the workflow explicitly as a graph: nodes are steps (each one a function, a tool call, or even its own LLM call with a specific role), edges define what happens next, and a shared state object flows through the whole graph, accumulating results as it goes.

        ┌─────────────┐
        │ Classify     │  (which kind of request is this?)
        │ intent       │
        └──────┬──────┘
               │
       ┌───────┴────────┐
       ▼                ▼
┌─────────────┐  ┌──────────────┐
│ Product       │  │ Store hours   │
│ lookup node   │  │ lookup node   │
└──────┬───────┘  └──────┬───────┘
       │                  │
       └────────┬─────────┘
                 ▼
          ┌─────────────┐
          │ Compose final│
          │ answer        │
          └─────────────┘
# Conceptual sketch — exact syntax varies by framework version, check current docs
from langgraph.graph import StateGraph

def classify_intent(state):
    state["intent"] = call_model_to_classify(state["question"])
    return state

def product_lookup(state):
    state["result"] = get_product_location(**state["question_args"])
    return state

def store_hours_lookup(state):
    state["result"] = check_store_hours(**state["question_args"])
    return state

def compose_answer(state):
    state["answer"] = call_model_to_compose(state["result"])
    return state

graph = StateGraph(dict)
graph.add_node("classify", classify_intent)
graph.add_node("product_lookup", product_lookup)
graph.add_node("store_hours", store_hours_lookup)
graph.add_node("compose", compose_answer)

graph.add_conditional_edges(
    "classify",
    lambda state: state["intent"],
    {"product": "product_lookup", "hours": "store_hours"},
)
graph.add_edge("product_lookup", "compose")
graph.add_edge("store_hours", "compose")

The shift from Chapter 3's loop to this: in a plain ReAct loop, the model implicitly decides the entire path through tool calls. In a graph, you define the possible paths explicitly, and the model's decisions are scoped to specific points (like the conditional edge after classify) rather than controlling the whole flow unconstrained. This trade-off — less flexibility, more predictability and debuggability — is exactly why production systems often move from "just let the model loop with tools" to explicit orchestration as workflows mature.

Interview Q&A


Q: Why would a team migrate from a simple ReAct tool-calling loop to an explicit graph-based orchestration framework as their agent matures? A: As workflows grow more complex (conditional branches, multiple specialized steps, error-handling paths), an unconstrained loop becomes harder to debug, test, and reason about — failures are buried inside the model's implicit decision-making. An explicit graph makes the possible paths visible in code, testable per-node, and easier to add monitoring/logging to at each defined step, at the cost of needing to anticipate those paths upfront rather than letting the model improvise entirely.

Q: Is a graph-based orchestrator still "agentic," given that you're defining the structure explicitly? A: Yes, as long as the model still makes meaningful decisions within that structure — e.g., which branch to take, what arguments to pass to a node, whether to retry. The model's autonomy is scoped rather than total, but scoped autonomy is exactly the pattern most production agentic systems converge on, since fully unconstrained autonomy is harder to make reliable.

Exercise: Sketch a graph (using the ASCII style above) for a SmartStore AI workflow that handles three intents: product lookup, store hours, and "neither — fall back to a general help message."


Chapter 7 — Multi-Agent Systems

Sometimes a single agent juggling many tools and a long system prompt becomes unwieldy — its instructions get long and self-contradictory, and the tasks involved are different enough in nature (researching vs. writing vs. reviewing) that one model "persona" handling all of them performs worse than several focused ones would.

A multi-agent system splits the work across multiple agents, each with a narrower role, description, and tool set, coordinated by either a central supervisor or a peer-to-peer handoff pattern.

Supervisor pattern:                    Peer handoff pattern:
        ┌────────────┐                 Agent A ──hands off──▶ Agent B
        │ Supervisor  │                   (when A determines B's
        │ Agent       │                    specialty is needed)
        └──────┬──────┘
       ┌───────┼────────┐
       ▼        ▼        ▼
   Research   Coding   Review
    Agent      Agent    Agent

Each sub-agent typically only sees the part of the conversation/context relevant to its job — the Research Agent doesn't need the Coding Agent's tool definitions cluttering its context, and vice versa. The supervisor's job is purely routing and combining results, not doing the underlying work itself.

This is also where two complementary protocols show up in current practice: MCP (this volume's main subject) standardizes how a single agent connects to tools and data; a separate, newer protocol called A2A (Agent-to-Agent), developed by Google, standardizes how multiple agents communicate and delegate tasks to each other. They solve different layers of the same broader problem — MCP for "agent to tool," A2A for "agent to agent" — and are often used together rather than as alternatives.

Interview Q&A

Q: When is a multi-agent design actually worth the added complexity, versus just giving one agent more tools? A: When the roles genuinely require different instructions, tone, or tool sets that would otherwise conflict or bloat a single system prompt — e.g., a "strict data-only lookup" persona and a "creative marketing copy" persona are awkward to combine reliably in one agent. If the tasks are closely related and share context naturally, a single agent with more tools is usually simpler and just as effective.

Q: What's the practical risk of a supervisor/sub-agent architecture that a single-agent design doesn't have? A: Coordination failures — the supervisor mis-routing a task to the wrong sub-agent, sub-agents producing inconsistent or contradictory results that need reconciling, and compounded latency/cost since a single user request may now trigger several separate model calls across multiple agents instead of one.

Exercise: If SmartStore AI later added a "meal planning" feature (suggest a recipe, then find all ingredients across the store), would you build that as one agent with more tools, or split it into a "recipe suggestion" agent and a "product lookup" agent? Justify your choice in 2-3 sentences.


Chapter 8 — What Is MCP and Why It Matters

Before MCP, connecting an AI application to N different tools meant writing custom integration code for every single app-to-tool pairing — an "M apps × N tools" problem where every new tool meant rewriting similar integration logic for every AI app that wanted to use it, with each one inventing its own auth and data-handling conventions.

The Model Context Protocol (MCP) is an open standard, originally introduced by Anthropic in November 2024, that replaces that fragmented mess with a single protocol: any MCP-compatible AI application (a "host") can connect to any MCP-compatible server and immediately discover what it can do, without bespoke integration code for that specific pairing. It's commonly described as "USB-C for AI" — one standard connector instead of a different cable for every device.

Before MCP:                              After MCP:
App A ──custom code──▶ Tool 1            App A ─┐
App A ──custom code──▶ Tool 2                    ├──▶ MCP (standard protocol) ──▶ Tool 1
App B ──custom code──▶ Tool 1            App B ─┤                              ──▶ Tool 2
App B ──custom code──▶ Tool 2            App C ─┘                              ──▶ Tool 3
App C ──custom code──▶ Tool 1
... (M × N custom integrations)          ... (build each side once, connect to any of the others)

As of 2026, MCP has moved well past "Anthropic's protocol" specifically — governance passed to the Linux Foundation in late 2025, and it's now supported by every major AI vendor (Anthropic, OpenAI, Google, Microsoft) as the de facto standard for connecting AI agents to tools and data. For your purposes, the important takeaway isn't the adoption statistics (those will be stale by the time you reread this) — it's the architectural shift: tool integration became a protocol-level concern, not an app-level one.

Interview Q&A

Q: How is MCP different from function/tool calling (Chapter 2) — doesn't MCP also let a model call tools? A: Function calling is the underlying capability — an LLM's ability to request invocation of a named function with arguments. MCP doesn't replace that; it standardizes the plumbing around it — how tools are discovered, described, authenticated against, and invoked consistently across different AI applications and tool providers, so the same server works with any compliant host instead of needing custom integration per host.

Q: A team says "we don't need MCP, we'll just write our own tool integrations directly." When is that actually a reasonable call? A: When there's only one AI application calling one stable, internal API, with no plan to reuse that integration elsewhere or expose it to other tools/hosts — in that narrow case, MCP is more machinery than the problem requires. MCP earns its value once multiple AI apps need the same capability, or you want a standard security/discovery model across many tools, which is exactly the enterprise scenario most production systems eventually grow into.

Exercise: Name one existing piece of SmartStore AI's stack (think: the product lookup function from Volume 2) that could be exposed as an MCP server, and one consumer (besides your own SwiftUI app) that could then reuse it without custom integration code.


Chapter 9 — MCP Architecture: Hosts, Clients, Servers

MCP defines three roles, and getting the terminology right matters because it comes up constantly once you're building or integrating with MCP servers.

┌───────────────────────────────────────────────────────┐
│  HOST  (the AI application the end user interacts with)│
│  e.g. Claude Desktop, Claude Code, a custom agent app   │
│                                                          │
│   ┌────────────┐      ┌────────────┐      ┌──────────┐ │
│   │ MCP Client  │      │ MCP Client  │      │ MCP      │ │
│   │ (for Server  │      │ (for Server  │      │ Client   │ │
│   │  A)          │      │  B)          │      │ (for C)  │ │
│   └─────┬───────┘      └─────┬───────┘      └────┬─────┘ │
└─────────┼────────────────────┼───────────────────┼───────┘
          │                    │                    │
          ▼                    ▼                    ▼
   ┌────────────┐       ┌────────────┐       ┌────────────┐
   │ MCP Server  │       │ MCP Server  │       │ MCP Server  │
   │ (e.g. your   │       │ (e.g. GitHub │       │ (e.g. your   │
   │  product DB) │       │  integration)│       │  calendar)   │
   └────────────┘       └────────────┘       └────────────┘
  • Host — the AI application the person actually uses. It manages the model's context, decides when to invoke tools, and runs one MCP client per connected server.
  • Client — a connection instance, one per server, instantiated by the host.
  • Server — exposes capabilities to any connected client, organized into three primitives:
  • Tools — callable functions the model can invoke (this is the same concept as Chapter 2's tool calling, now exposed through a standard protocol).
  • Resources — data the model can read (files, database records, documents) without it necessarily being a function call.
  • Prompts — reusable prompt templates for common workflows the server author wants to standardize.

Communication happens over JSON-RPC 2.0, using one of two transports: stdio (the host runs the server as a local subprocess — fast, simple, but local-only) or Streamable HTTP with Server-Sent Events (the server runs as a remote, scalable web service, supporting multiple concurrent clients and enterprise authentication). Remote servers use OAuth 2.1 for authentication as of the current spec.

For SmartStore AI's actual architecture: if you exposed your product-lookup capability (Volume 2) as an MCP server, it would declare a get_product_location tool (same shape as Chapter 2's tool definition), and any MCP-compatible host — your own future internal tooling, a third-party assistant, a teammate's debugging client — could connect and use it without you writing custom integration code for each one.

Interview Q&A

Q: Why does a host run one separate client per server, rather than one client managing all server connections? A: Isolating each connection means a problem with one server (a crash, a slow response, a misbehaving tool) doesn't directly corrupt the state of another unrelated connection, and it keeps each server's capabilities (tools/resources/prompts) cleanly scoped per-connection rather than merged into one ambiguous pool the host has to disentangle.

Q: When would you choose the stdio transport over Streamable HTTP for an MCP server you're building? A: Stdio when the server only ever needs to run locally alongside the host on the same machine (e.g., a local development tool, a filesystem server) — it's lower latency and simpler to set up with no network/auth layer needed. Streamable HTTP when the server needs to be reachable remotely, support multiple concurrent clients/hosts, or integrate with enterprise authentication — i.e., any server meant to be shared infrastructure rather than a single local process.

Exercise: For a hypothetical SmartStore AI MCP server exposing product lookup, would you build it with stdio or Streamable HTTP transport, given that it needs to serve your SwiftUI app's backend in production? Justify it in one sentence.


Chapter 10 — Building a Minimal MCP Server

This wires Chapter 9's architecture into actual code, using the official Python MCP SDK's general shape (exact syntax evolves with SDK versions — check the current SDK docs/examples when you build this for real).

# ── mcp_server.py ─────────────────────────────────────────────────────
# Exposes SmartStore AI's product lookup (from Volume 2) as a standard
# MCP tool, usable by any MCP-compatible host, not just your own app.

from mcp.server.fastmcp import FastMCP
from query import retrieve  # the function built in Volume 2, Chapter 13

mcp = FastMCP("smartstore-product-lookup")

@mcp.tool()
def get_product_location(product_name: str, store_id: str) -> str:
    """Find which aisle a product is located in, for a given store."""
    results = retrieve(product_name, store_id)
    return results[0] if results else "Product not found in this store."

if __name__ == "__main__":
    mcp.run()  # defaults to stdio transport for local use

That's the entire server: one decorated function, with its docstring doing double duty as the tool's description (exactly the field from Chapter 2 that the model relies on to decide when to use it). Any MCP-compatible host can now connect to this process, automatically discover the get_product_location tool via the protocol's standard discovery mechanism, and call it — without you writing any host-specific integration code.

The connecting host (your own agent code, or a third-party MCP client) handles the other side — establishing the client connection, listing available tools, and routing model tool-use requests to this server, then feeding results back, exactly like Chapter 2's loop, just over the standardized protocol instead of an in-process function call.

Interview Q&A

Q: What's actually different between this MCP server and just calling retrieve() directly inside your FastAPI backend, the way Volume 2 originally did it? A: Functionally, very little for your own app specifically — but exposing it as an MCP server makes the capability reusable by other hosts without custom integration (Chapter 8's core value proposition), and it forces a clean, explicit interface boundary (the tool's schema and description) rather than an arbitrary internal function signature that might change without notice.

Q: A tool's docstring/description comes from a server you didn't write yourself. Why does the MCP security guidance treat tool descriptions as potentially untrusted? A: Because a malicious or compromised server could write a tool description that looks innocuous but is actually crafted to manipulate the connecting model's behavior — this is the same prompt-injection family of risk from Volume 1, Chapter 13 and Volume 2, Chapter 10, just surfacing through a new channel (tool metadata) rather than retrieved document text. Treat third-party tool descriptions with the same "untrusted data, not instructions" caution as any other external content.

Exercise: Add a second tool to mcp_server.py, check_store_hours(store_id: str) -> str, following the same decorator pattern.


Chapter 11 — Agent Safety: Guardrails, Confirmation, and Idempotency

A chatbot's worst mistake is saying something wrong. An agent's worst mistake is doing something wrong — sending an email twice, double-charging a customer, deleting the wrong record. Every chapter in this volume up to now has been about making agents more capable; this one is about the constraints that keep that capability from causing real damage.

Risk increases with:                  Mitigated by:
- Irreversible actions                - Human-in-the-loop confirmation
  (delete, send, charge)                before irreversible/high-stakes
- Ambiguous instructions               actions execute
- Retries/duplicate calls             - Idempotency keys, so retrying a
- Untrusted tool descriptions           call doesn't duplicate its effect
  (Chapter 10)                        - Explicit allow-lists of permitted
                                         actions per agent/context
                                       - Logging every tool call for audit

Human-in-the-loop confirmation: for any action with real-world consequences (sending a notification, modifying inventory records, anything resembling a financial transaction), insert an explicit confirmation step rather than letting the agent execute autonomously the moment it decides to.

HIGH_RISK_TOOLS = {"send_notification", "update_inventory", "process_refund"}

def execute_tool_call(call, require_confirmation_fn):
    if call.name in HIGH_RISK_TOOLS:
        if not require_confirmation_fn(call.name, call.input):
            return "Action cancelled — not confirmed."
    return tool_functions[call.name](**call.input)

Idempotency: tool calls can get retried — by your own error-handling code, by a flaky network, by the model itself re-requesting something it's unsure completed. An idempotent action produces the same end result whether it runs once or five times; a non-idempotent one (like "send an email" with no dedication check) causes real duplicated side effects on retry.

def send_notification_once(notification_id: str, message: str):
    if r.exists(f"sent:{notification_id}"):
        return "Already sent — skipped duplicate."
    actually_send(message)
    r.set(f"sent:{notification_id}", "1", ex=86400)
    return "Sent."

This connects directly back to Volume 1's safety principle (treat untrusted input as data, never instructions) and Volume 2's permission scoping (Chapter 12) — agent safety is the same family of concern, now extended to actions instead of just answers.

Interview Q&A

Q: Why is idempotency specifically an agent concern, more so than in typical chatbot/RAG systems? A: A RAG chatbot's "side effect" is just generating text — retrying a failed request and getting the same answer twice is harmless. An agent's tool calls can have real side effects (sending, charging, modifying records); retrying a failed or uncertain tool call without idempotency protection can directly duplicate those real-world effects, which is a fundamentally higher-stakes failure mode than a duplicated text response.

Q: A SmartStore AI agent is given a tool to "flag a product as out of stock for restocking." Should this require human confirmation, and why? A: It depends on reversibility and blast radius — flagging for restocking is likely low-risk and reversible (easy to un-flag), so autonomous execution is probably fine. Compare this to a hypothetical "automatically reorder $5,000 of inventory" tool, which is financially consequential and harder to reverse — that one clearly warrants human confirmation. The general principle: gate confirmation requirements on consequence and reversibility, not on "is this an agent action" as a blanket rule.

Exercise: Classify these three hypothetical SmartStore AI agent actions as low-risk/autonomous-OK, or high-risk/needs-confirmation, and justify each: (a) logging a search query for analytics; (b) sending a "your item is back in stock" push notification to a customer; (c) updating a product's displayed price.


Chapter 12 — Enterprise Agent Integrations

Once an agent moves beyond your own internal tools into connecting with real third-party systems — email, calendars, ticketing systems, CRMs — a few enterprise-specific concerns become unavoidable, on top of everything covered so far.

  • Authentication and scoped tokens — an agent acting on a user's behalf (reading their calendar, sending email as them) needs credentials scoped to exactly what it's permitted to do, ideally per-user rather than one shared service credential for every user the agent serves. This is the same principle as Volume 2, Chapter 12's access-control filtering, now applied to external API credentials instead of vector search filters.
  • Audit logging — every tool call an agent makes, especially against external systems, should be logged with who triggered it, what arguments were used, and what the result was. This is what makes "why did the agent do that" answerable after the fact, which matters enormously the first time something goes wrong in production.
  • Rate limiting and cost controls — an agent looping through tool calls (Chapter 3) against a paid third-party API can rack up cost or hit provider rate limits fast if something goes wrong; enterprise deployments need explicit caps, not just trust that the agent will behave.
  • MCP as the integration layer: in current practice (Chapter 8-10), most enterprise tool integrations — Jira, Slack, GitHub, internal databases — are increasingly exposed as MCP servers specifically because this gives a consistent place to enforce auth, logging, and access control once, rather than re-implementing it inside every custom integration.
Agent request: "Email the customer about their order"
        │
        ▼
  Scoped token check: does this agent/user have permission
  to send email on this customer's behalf?
        │
        ▼
  Audit log entry written: who/what/when/arguments
        │
        ▼
  Rate limit check: has this agent sent too many emails
  in the last hour?
        │
        ▼
  Action executes (or is blocked, with a logged reason)

Interview Q&A

Q: Why is centralizing auth/logging/rate-limiting at the integration (MCP server) layer better than handling it inside each agent's own code? A: Implementing these consistently inside every individual agent invites drift and gaps — one agent remembers to log, another doesn't; one enforces scoped tokens correctly, another uses an overly broad shared credential. Centralizing it at the server/integration layer means every consumer of that integration gets the same guarantees automatically, rather than relying on every agent author getting it right independently.

Q: An agent's email-sending tool call fails silently and the customer never gets notified. What enterprise practice from this chapter would have caught this faster? A: Audit logging — a logged entry for every tool call attempt (including failures and the reason) lets you trace exactly what happened and when, rather than only discovering the gap when a customer complains. Logging "intent and outcome" for every action is what turns an opaque agent failure into a debuggable one.

Exercise: For a future SmartStore AI feature where the agent sends a "your order is ready for pickup" notification, list the four enterprise safeguards from this chapter you'd want in place before letting that run autonomously in production.


Chapter 13 — Hands-On: A Minimal Tool-Calling Agent for SmartStore AI

This pulls together Chapters 1-3 (and Volume 2's pipeline) into one real, runnable agent — the actual shape your FastAPI backend's "ask the assistant" endpoint would take.

# ── agent.py ───────────────────────────────────────────────────────────
import anthropic
from query import retrieve

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_product_location",
        "description": "Find which aisle a product is in at a specific store.",
        "input_schema": {
            "type": "object",
            "properties": {
                "product_name": {"type": "string"},
                "store_id": {"type": "string"},
            },
            "required": ["product_name", "store_id"],
        },
    },
    {
        "name": "check_store_hours",
        "description": "Check today's opening hours for a specific store.",
        "input_schema": {
            "type": "object",
            "properties": {"store_id": {"type": "string"}},
            "required": ["store_id"],
        },
    },
]

def get_product_location(product_name: str, store_id: str) -> str:
    results = retrieve(product_name, store_id)
    return results[0] if results else "Product not found in this store."

def check_store_hours(store_id: str) -> str:
    # Placeholder — would query your store-hours table in Postgres
    return "Store hours: 8:00 AM - 10:00 PM"

tool_functions = {
    "get_product_location": get_product_location,
    "check_store_hours": check_store_hours,
}

def run_agent(user_message: str, store_id: str, max_steps: int = 5) -> str:
    system = (
        "You are SmartStore AI's retail assistant. Use the available tools "
        "to answer questions about product locations and store hours. "
        f"The current store is {store_id}. Only answer using tool results "
        "— if you don't have the information, say so."
    )
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6", max_tokens=500,
            system=system, tools=tools, messages=messages,
        )
        tool_calls = [b for b in response.content if b.type == "tool_use"]

        if not tool_calls:
            return "".join(b.text for b in response.content if b.type == "text")

        messages.append({"role": "assistant", "content": response.content})
        results = []
        for call in tool_calls:
            output = tool_functions[call.name](**call.input)
            results.append({"type": "tool_result", "tool_use_id": call.id, "content": output})
        messages.append({"role": "user", "content": results})

    return "I wasn't able to complete this within the allowed steps."

if __name__ == "__main__":
    print(run_agent("Where's the olive oil, and is the store open right now?", store_id="store_123"))

This single request now requires the model to decide two tool calls are needed, execute both, and combine the results — the exact ReAct loop from Chapter 3, using the real RAG retrieval from Volume 2, with real tool definitions from Chapter 2. This is genuinely most of what SmartStore AI's core assistant needs architecturally; the remaining work is wrapping it in FastAPI, adding the memory layer from Chapter 4, and the safety guardrails from Chapter 11 as you add tools with real side effects.

Exercise (the real one): Add a third tool, get_product_price(product_name, store_id), wire it into tool_functions, and test a question that requires all three tools in one request (location, hours, and price). Watch how many steps the loop actually takes — that's your first real look at how an agent's reasoning unfolds in practice, not just in a diagram.


Appendix A — Glossary

Term Meaning
Agent A system where the model decides which actions to take, not just what to say
Tool calling The model requesting invocation of a named function with arguments; your code executes it
ReAct The Reason → Act → Observe loop underlying most tool-calling agents
Planning Decomposing a complex goal into ordered subtasks before acting
Orchestration Explicitly defining an agent workflow's control flow (often as a graph) rather than an unconstrained loop
Multi-agent system Splitting work across several narrowly-scoped agents, coordinated by a supervisor or peer handoff
MCP Model Context Protocol — the open standard for connecting AI hosts to tool/data servers
Host / Client / Server MCP's three roles: the AI app, its per-server connection, and the tool/data provider
Idempotency A property where repeating an action produces the same end result as doing it once
A2A Agent-to-Agent protocol (Google) — standardizes communication between multiple agents, complementary to MCP

Appendix B — Chapter Summary Table

# Chapter Core takeaway
1 Agent vs. chatbot The model, not your code, decides the next step
2 Tool calling Models request actions; your code executes and reports results back
3 ReAct Reason → Act → Observe, looped, with an explicit step limit
4 Memory Short-term = session messages; long-term = explicitly persisted (Redis/Postgres)
5 Planning Decompose multi-part goals before executing, for inspectability
6 Orchestration Graphs trade some flexibility for predictability as workflows grow
7 Multi-agent Split roles when one persona/tool-set becomes unwieldy; MCP + A2A are complementary layers
8 What MCP solves Turns tool integration from an app-level into a protocol-level concern
9 MCP architecture Host runs clients; servers expose tools/resources/prompts over JSON-RPC
10 Building an MCP server A few lines exposing an existing function as a reusable, discoverable tool
11 Agent safety Confirmation for irreversible actions; idempotency against retries
12 Enterprise integrations Scoped auth, audit logging, rate limits — increasingly centralized via MCP
13 Hands-on agent The real shape of SmartStore AI's tool-calling assistant, end to end

Next: Volume 4 — Production Enterprise AI (architecture, RBAC, deployment, monitoring, and governance — how everything in Volumes 1-3 gets hardened into something you'd actually run in production).