Overview

This phase assumes you are comfortable with basic LangChain agents and LangGraph workflows. You now focus on advanced agentic patterns: richer reasoning, hierarchical planning, multimodal and code-execution tools, long-term memory, safety & governance, and scalable multi-agent systems.

Modules in this Phase

Module 10 – Advanced Agent Reasoning Patterns
Module 11 – Planning & Hierarchical Agents
Module 12 – Advanced Tools, Multimodal & Code Agents
Module 13 – Memory, Safety, Governance & Evaluation
Module 14 – Multi-Agent Coordination & Scalable Deployment

Module 10 – Advanced Agent Reasoning Patterns

This module moves beyond simple “ask a model, maybe call a tool” agents into richer reasoning patterns such as ReAct, reflexion, and multi-branch reasoning.

10.1 ReAct-style Agents (Reason + Act)

ReAct (Reason + Act) is a pattern where the model explicitly emits:

a Thought – natural language reasoning, and
an Action – which tool to call (with arguments).

A simple ReAct loop can be implemented in LangChain with a prompt that encourages this structure:

# src/phase4/react_agent.py
from typing import List, Tuple
import re
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool


@tool
def search_notes(query: str) -> str:
    """Search the course notes for information related to the query (placeholder)."""
    # In a real app, call your RAG pipeline instead.
    return f"[Search results for: {query}]"


TOOLS = {
    "search_notes": search_notes,
}


def parse_react_output(text: str) -> Tuple[str, str, str]:
    """
    Parse a simple ReAct style output of the form:
      Thought: ...
      Action: search_notes["query"]
    """
    thought_match = re.search(r"Thought:(.*)", text, re.DOTALL)
    action_match = re.search(r"Action:\s*(\w+)\[(.*)\]", text)

    thought = thought_match.group(1).strip() if thought_match else ""
    if not action_match:
        return thought, "", ""
    tool_name = action_match.group(1).strip()
    arg = action_match.group(2).strip().strip('"').strip("'")
    return thought, tool_name, arg


def run_react_loop(question: str, max_steps: int = 3) -> str:
    load_dotenv()
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

    history: List[str] = []
    observations: List[str] = []

    for step in range(max_steps):
        prompt = (
            "You are a ReAct agent. Use Thought/Action/Observation steps. "
            "You have a tool: search_notes[query].\n\n"
            f"Question: {question}\n\n"
        )
        if observations:
            prompt += "Previous observations:\n" + "\n".join(observations) + "\n\n"

        prompt += (
            "Respond strictly in this format:\n"
            "Thought: <your reasoning>\n"
            "Action: tool_name[\"argument\"] OR Action: finish[\"final answer\"]\n"
        )

        response = llm.invoke(prompt)
        text = response.content
        history.append(text)

        thought, tool_name, arg = parse_react_output(text)
        if tool_name == "finish":
            return arg  # final answer

        tool = TOOLS.get(tool_name)
        if not tool:
            observations.append(f"Observation: Unknown tool '{tool_name}'.")
            continue

        tool_result = tool.invoke({"query": arg})
        observations.append(f"Observation: {tool_result}")

    # If loop ended without finish, summarize best effort
    return "I could not fully answer within the step limit. Partial reasoning:\n" + "\n".join(history)

This example shows the core idea: the model emits a structured “Thought” and “Action” block; your code parses it, calls tools, and feeds observations back in the next step.

10.2 Reflexion / Self-Critique Loop

Reflexion patterns add a meta-step where the agent critiques its own answers and revises them. You can implement this with a follow-up prompt that asks the model to evaluate and refine its own output.

# src/phase4/reflexion.py
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate


def answer_and_reflect(question: str) -> str:
    load_dotenv()
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.5)

    # Step 1: initial answer
    base_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a helpful assistant."),
            ("human", "{question}"),
        ]
    )
    base_chain = base_prompt | llm | StrOutputParser()
    draft = base_chain.invoke({"question": question})

    # Step 2: critique and revise
    reflexion_prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a strict reviewer. Check the answer for correctness, missing details, "
                "and hallucinations. Then rewrite a final, improved answer.",
            ),
            (
                "human",
                "Question:\n{question}\n\nDraft answer:\n{draft}\n\n"
                "First, list potential issues. Then provide a corrected final answer.",
            ),
        ]
    )
    reflexion_chain = reflexion_prompt | llm | StrOutputParser()
    final_answer = reflexion_chain.invoke({"question": question, "draft": draft})
    return final_answer

10.3 Multi-Branch / Tree-of-Thought-style Reasoning

In more complex tasks, you may want the model to generate multiple candidate solutions (“branches”), score them, and pick the best. A light-weight approach:

Ask the model to generate N reasoning paths.
Ask it (or another model) to score each path.
Pick the highest-scoring final answer.

# src/phase4/multi_branch.py
from typing import List
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


def generate_branches(question: str, n: int = 3) -> List[str]:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "Generate multiple different reasoning paths to solve the problem.",
            ),
            (
                "human",
                "Question: {question}\n\nGenerate {n} distinct candidate answers with detailed reasoning, "
                "labeled as 'Candidate 1', 'Candidate 2', etc.",
            ),
        ]
    )
    chain = prompt | llm | StrOutputParser()
    text = chain.invoke({"question": question, "n": n})
    # For teaching purposes, we return the raw response; in production, parse each candidate explicitly.
    return [text]


def score_and_select(question: str, candidates: List[str]) -> str:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
    scoring_prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a grader. Evaluate candidate answers and pick the best one.",
            ),
            (
                "human",
                "Question:\n{question}\n\nCandidates:\n{candidates}\n\n"
                "Choose the best candidate and explain briefly why.",
            ),
        ]
    )
    chain = scoring_prompt | llm | StrOutputParser()
    return chain.invoke({"question": question, "candidates": "\n\n".join(candidates)})

Module 11 – Planning & Hierarchical Agents

Planning agents separate planning (deciding what to do) from execution (doing it). This structure makes complex tasks more reliable and inspectable.

11.1 Plan-and-Execute Pattern

A Plan-and-Execute agent typically involves:

A planner that converts a high-level goal into a list of steps.
An executor that executes steps using tools, RAG, or other agents.

# src/phase4/plan_and_execute.py
from typing import List
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


def build_llm():
    return ChatOpenAI(model="gpt-4o-mini", temperature=0.2)


def plan_steps(goal: str) -> List[str]:
    llm = build_llm()
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a planner. Break the goal into 3-7 concrete steps."),
            ("human", "{goal}"),
        ]
    )
    chain = prompt | llm | StrOutputParser()
    raw_plan = chain.invoke({"goal": goal})
    steps: List[str] = []
    for line in raw_plan.splitlines():
        line = line.strip()
        if not line:
            continue
        if "." in line:
            line = line.split(".", 1)[1].strip()
        steps.append(line)
    return steps


def execute_step(step: str) -> str:
    llm = build_llm()
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are an executor. Perform the step and report results clearly."),
            ("human", "Step: {step}"),
        ]
    )
    chain = prompt | llm | StrOutputParser()
    return chain.invoke({"step": step})


def plan_and_execute(goal: str) -> str:
    load_dotenv()
    steps = plan_steps(goal)
    reports = []
    for i, step in enumerate(steps, start=1):
        result = execute_step(step)
        reports.append(f"Step {i}: {step}\nResult:\n{result}")
    return "\n\n".join(reports)

11.2 Wrapping Planner & Executor in LangGraph

You can wrap this pattern in a LangGraph to track state and allow dynamic replanning:

# src/phase4/plan_graph.py
from typing import TypedDict, List
from langgraph.graph import StateGraph
from plan_and_execute import plan_steps, execute_step


class PlanState(TypedDict, total=False):
    goal: str
    steps: List[str]
    current_index: int
    reports: List[str]


def node_plan(state: PlanState) -> PlanState:
    steps = plan_steps(state["goal"])
    return {"steps": steps, "current_index": 0, "reports": []}


def node_execute(state: PlanState) -> PlanState:
    idx = state.get("current_index", 0)
    steps = state["steps"]
    if idx >= len(steps):
        return {}
    step = steps[idx]
    result = execute_step(step)
    reports = state.get("reports", [])
    reports.append(f"{step}\nResult:\n{result}")
    return {"reports": reports, "current_index": idx + 1}


def node_decide_next(state: PlanState) -> str:
    if state.get("current_index", 0) < len(state.get("steps", [])):
        return "execute"
    return "finish"


def build_plan_graph():
    workflow = StateGraph(PlanState)
    workflow.add_node("plan", node_plan)
    workflow.add_node("execute", node_execute)
    workflow.add_node("decider", node_decide_next)

    workflow.set_entry_point("plan")
    workflow.add_edge("plan", "execute")
    workflow.add_edge("execute", "decider")

    workflow.add_conditional_edges(
        "decider",
        node_decide_next,
        {
            "execute": "execute",
            "finish": "__end__",  # special END label; actual API may differ
        },
    )

    return workflow.compile()

Check the latest LangGraph docs for the exact constant representing the end node (often END). The pattern above is what matters: a planner node, an executor node, and a decision node that loops.

Module 12 – Advanced Tools, Multimodal & Code Agents

In this module you extend agents with richer tools: multiple tool pipelines, multimodal tools, and code/DB execution tools.

12.1 Dynamic Toolsets & Tool Pipelines

As your application grows, you may have many tools. You can dynamically choose which tools to expose based on user, environment, or state.

# src/phase4/dynamic_tools.py
from typing import List
from dotenv import load_dotenv

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate


@tool
def web_search(query: str) -> str:
    """Search the web for the given query (placeholder)."""
    return f"[Web search results for: {query}]"


@tool
def summarize_text(text: str) -> str:
    """Summarize a long piece of text."""
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    return llm.invoke(f"Summarize this:\n\n{text}").content


def build_agent_for_mode(mode: str) -> AgentExecutor:
    load_dotenv()
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    base_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a versatile assistant that can use tools."),
            ("human", "{input}"),
        ]
    )

    if mode == "research":
        tools = [web_search, summarize_text]
    else:
        tools = [summarize_text]

    agent = create_tool_calling_agent(llm, tools, base_prompt)
    return AgentExecutor(agent=agent, tools=tools, verbose=True)

12.2 Multimodal Tools (Images, Documents)

Many modern models can see images and PDFs. In LangChain, you typically:

Implement a tool that accepts an image path or bytes.
Use a multimodal-capable model under the hood.

A sketch of an “analyze image” tool (pseudo-code, since exact API depends on the provider):

# src/phase4/image_tool.py
from langchain_core.tools import tool


@tool
def analyze_image(path: str) -> str:
    """
    Analyze an image at the given path and describe it.
    In practice, use a multimodal model (e.g. OpenAI Vision) here.
    """
    # Example (pseudo-code, adjust to real API):
    # from langchain_openai import ChatOpenAI
    # llm = ChatOpenAI(model="gpt-4o-mini")  # vision-capable variant
    # with open(path, "rb") as f:
    #     img_bytes = f.read()
    # response = llm.invoke([ImageMessage(img_bytes), TextMessage("Describe this image.")])
    # return response.content
    return f"(Stub) Would analyze image at {path} using a vision model."

12.3 Code-Interpreter / Python REPL Tools

Code-execution tools are powerful but must be sandboxed. Here is a minimal, constrained interpreter:

# src/phase4/code_interpreter.py
from typing import Any, Dict
from langchain_core.tools import tool


SAFE_GLOBALS: Dict[str, Any] = {
    "__builtins__": {
        "abs": abs,
        "min": min,
        "max": max,
        "sum": sum,
        "len": len,
        "range": range,
    }
}


@tool
def python_eval(code: str) -> str:
    """
    Evaluate a small Python expression safely (no imports, no IO).
    Intended for numeric or simple list/dict operations.
    """
    try:
        result = eval(code, SAFE_GLOBALS, {})
        return repr(result)
    except Exception as e:
        return f"Error: {e}"

You can expose this tool to an agent with a clear system prompt that restricts what kind of code it is allowed to generate (no file/network access, no imports).

Module 13 – Memory, Safety, Governance & Evaluation

This module strengthens your agents with long-term memory, explicit safety rules, and evaluation practices.

13.1 Long-Term Memory with a Vector Store

Instead of only relying on conversation buffers, you can store key events into a vector store and let the agent retrieve them later as “memory”.

# src/phase4/vector_memory.py
from typing import List
from dataclasses import dataclass

from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


@dataclass
class MemoryStore:
    vectorstore: FAISS

    @classmethod
    def create(cls):
        embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
        # Start empty
        return cls(vectorstore=FAISS.from_texts([], embeddings))

    def add_event(self, text: str, metadata: dict | None = None):
        self.vectorstore.add_texts([text], metadatas=[metadata or {}])

    def recall(self, query: str, k: int = 5) -> List[str]:
        docs = self.vectorstore.similarity_search(query, k=k)
        return [d.page_content for d in docs]


def chat_with_memory(question: str, memory: MemoryStore) -> str:
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are an assistant with long-term memory. Use the provided memory snippets if helpful.",
            ),
            (
                "human",
                "Memory snippets:\n{memories}\n\nUser question:\n{question}",
            ),
        ]
    )
    chain = prompt | llm | StrOutputParser()

    memories = memory.recall(question)
    memories_text = "\n\n".join(memories) if memories else "(no relevant memories)"
    answer = chain.invoke({"memories": memories_text, "question": question})

    # After answering, store this exchange as a new memory
    memory.add_event(f"Q: {question}\nA: {answer}", metadata={"type": "qa"})
    return answer

13.2 Safety Policies Around Tool Use

You can implement a simple policy engine that decides whether a tool call is allowed. For example, forbid certain dangerous patterns or domains.

# src/phase4/tool_policy.py
from typing import Dict, Any


def is_tool_call_allowed(tool_name: str, args: Dict[str, Any]) -> bool:
    # Example rules:
    if tool_name == "python_eval":
        code = args.get("code", "")
        if "import" in code or "__" in code:
            return False
    if tool_name == "web_search":
        query = args.get("query", "")
        if "password" in query.lower():
            return False
    return True


def guard_tool_call(tool_name: str, args: Dict[str, Any], call_fn):
    if not is_tool_call_allowed(tool_name, args):
        return f"Policy blocked tool '{tool_name}' with args {args}"
    return call_fn(**args)

13.3 Logging & Audit Trails

For governance and debugging, log every tool call and decision:

import logging
import uuid

logger = logging.getLogger("agent_audit")
logging.basicConfig(level=logging.INFO)


def log_tool_call(tool_name: str, args: dict, result: str, allowed: bool, run_id: str | None = None):
    if run_id is None:
        run_id = str(uuid.uuid4())
    logger.info(
        "run_id=%s tool=%s allowed=%s args=%s result_snippet=%s",
        run_id,
        tool_name,
        allowed,
        args,
        result[:120].replace("\n", " "),
    )

13.4 Evaluating Agent Behavior

You can build simple evaluation harnesses around your agents:

Define a dataset of tasks with expected behaviors.
Run your agents on each task; log outputs and metrics (success/failure, latency, cost).
Use automatic checks plus manual review.

For deeper evaluation, you can use tools like LangSmith to capture traces and run evaluators on top.

Module 14 – Multi-Agent Coordination & Scalable Deployment

This final module looks at multi-agent collaboration patterns and what it takes to run many agent sessions in production.

14.1 Debate & Critique Agents

A simple debate pattern:

Two agents independently produce answers.
A third “judge” agent compares and chooses.

# src/phase4/debate.py
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


def build_debater(name: str) -> ChatOpenAI:
    return ChatOpenAI(model="gpt-4o-mini", temperature=0.7)


def debate(question: str) -> str:
    load_dotenv()
    llm1 = build_debater("Agent A")
    llm2 = build_debater("Agent B")

    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are {name}. Provide a detailed answer."),
            ("human", "{question}"),
        ]
    )
    chain = prompt | StrOutputParser()

    a_answer = (prompt | llm1 | StrOutputParser()).invoke({"name": "Agent A", "question": question})
    b_answer = (prompt | llm2 | StrOutputParser()).invoke({"name": "Agent B", "question": question})

    judge_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.1)
    judge_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You are a judge. Compare two answers and pick the better one."),
            (
                "human",
                "Question:\n{question}\n\nAnswer A:\n{a}\n\nAnswer B:\n{b}\n\n"
                "Explain which answer is better and why. Then restate the final answer.",
            ),
        ]
    )
    judge_chain = judge_prompt | judge_llm | StrOutputParser()
    return judge_chain.invoke({"question": question, "a": a_answer, "b": b_answer})

14.2 Multi-Agent LangGraph Patterns

In LangGraph, you can represent each agent as a node or subgraph. A debate graph might:

Have nodes for agent_a, agent_b, and judge.
Store their answers in shared state.
End with a final_answer field chosen by the judge.

14.3 Scaling Agent Sessions

For production workloads:

Run agents in containers (e.g. FastAPI + LangGraph inside Docker).
Use a queue or event system to manage long-running tasks.
Implement backpressure and rate limiting against LLM APIs.
Isolate user data by using per-user state keys and separate memory stores.

Many of these patterns are architecture-specific, but the mental model stays the same: LangChain handles modeling and tools, LangGraph orchestrates multi-step and multi-agent flows, and your infrastructure provides scaling, queues, and storage.

← Previous: Phase 3 Back to Home →