Why Single-Agent Systems Hit a Ceiling

A well-built single agent — one reasoning engine, one set of tools, one loop — can handle a surprising breadth of tasks. But single-agent systems fail predictably when tasks require diverse expertise, parallel execution, or independent verification of results. A research task that spans legal analysis, financial modelling, and technical writing is asking one agent to be simultaneously an expert in three domains. A code review workflow that needs a security scanner, a style checker, and a business logic reviewer cannot run those checks sequentially and deliver them in seconds.

Multi-agent systems address these limitations by decomposing work across specialised agents that collaborate toward a shared objective. The architectural insight is simple: one agent, one responsibility. This principle reduces errors, simplifies debugging, enables parallel execution, and allows each agent to be optimised for its specific role — including using a cheaper, faster model for simple subtasks and a more capable model for complex reasoning.

The Orchestrator-Worker Pattern

The most common multi-agent architecture is the orchestrator-worker pattern. An orchestrator (also called a coordinator or manager agent) receives the overall task, decomposes it into subtasks, dispatches those subtasks to specialised worker agents, collects results, and produces a final output.

A critical engineering rule: the orchestrator should be code, not an LLM. Using an LLM as the orchestrator introduces non-determinism into the coordination layer — the exact sequencing, error handling, and termination logic should be controlled by software you write, not by a language model's emergent reasoning. The LLM participates in deciding what to do; code controls how the workflow executes.

class Orchestrator:
    def __init__(self, agents: dict):
        self.agents = agents  # {"planner": PlannerAgent, ...}

    def run(self, task: str) -> dict:
        # Step 1: Planning
        plan = self.agents["planner"].run(task)

        # Step 2: Dispatch subtasks
        results = {}
        for subtask in plan.subtasks:
            agent_name = self.route(subtask)
            results[subtask.id] = self.agents[agent_name].run(
                subtask.description
            )

        # Step 3: Synthesis
        return self.agents["synthesiser"].run(task, results)

    def route(self, subtask) -> str:
        # Deterministic routing based on subtask type
        routing_map = {
            "research": "research_agent",
            "code":     "coding_agent",
            "review":   "reviewer_agent",
        }
        return routing_map.get(subtask.type, "general_agent")

Task Decomposition Strategies

The quality of a multi-agent system depends heavily on how the task is decomposed. Poor decomposition creates agents that block each other, produce mismatched outputs, or duplicate work. Three decomposition strategies cover most production use cases:

  • Functional decomposition — divide by specialisation (research agent, writing agent, review agent). Each agent has a clear domain. This is the most common pattern and maps naturally to role-based agent design in frameworks like CrewAI.
  • Data decomposition — divide by input slice (agent 1 analyses documents 1–100, agent 2 analyses 101–200, etc.). Enables parallel processing of large datasets. Results are merged by the orchestrator. Effective for batch processing pipelines.
  • Pipeline decomposition — sequential stages where each stage's output becomes the next stage's input (ingest → analyse → summarise → format). The clearest mental model, but limits parallelism unless stages within a pipeline can run concurrently.

Effective decomposition produces subtasks that are: independent enough to run in parallel where possible; specific enough that one agent can complete them without needing to call another agent mid-execution; and well-defined enough that the orchestrator can validate completion.

Sequential vs Parallel Execution

Sequential execution runs agents in order, each feeding the next. Use it when outputs are genuinely dependent — when agent B needs agent A's result to proceed. It is simpler to debug and produces deterministic ordering, but it is as slow as the sum of all agent completion times.

Parallel execution dispatches independent subtasks simultaneously. A research pipeline that needs legal analysis, financial analysis, and technical analysis can run all three at once and merge results at the end. Using Python's asyncio or concurrent.futures.ThreadPoolExecutor, parallel execution can reduce wall-clock time by the number of independent subtasks:

import asyncio

async def run_parallel_agents(
    agents: list, subtasks: list
) -> list:
    coroutines = [
        agent.run_async(subtask)
        for agent, subtask in zip(agents, subtasks)
    ]
    return await asyncio.gather(*coroutines)

# Wall-clock time ≈ max(individual_times) not sum

Hierarchical collaboration adds a third option: a manager agent coordinates worker agents, delegating dynamically based on intermediate results. This is the most flexible pattern and is the model used by CrewAI and LangGraph's supervisor workflows.

Inter-Agent Communication: Message Passing and Shared State

Agents must communicate explicitly. Two approaches: message passing and shared state. Message passing is strongly preferred in production systems because it makes communication explicit, auditable, and versionable.

A structured message schema prevents free-form agent "chatter" that is hard to debug:

from dataclasses import dataclass
from datetime import datetime

@dataclass
class AgentMessage:
    sender: str
    recipient: str
    msg_type: str  # "task", "result", "error", "escalate"
    payload: dict
    timestamp: str = field(
        default_factory=lambda: datetime.utcnow().isoformat()
    )
    correlation_id: str = ""  # links request to response

Shared state (a dictionary or database row that all agents can read and write) is simpler to implement but creates race conditions when agents run in parallel and introduces hidden coupling between agents. If you use shared state, enforce read/write permissions per agent and use database transactions or locks to prevent concurrent writes.

Error Handling and Retry Logic in Agent Networks

Multi-agent systems multiply failure modes. A single-agent failure affects one agent. In a multi-agent system, a failure in one agent can block downstream agents or corrupt shared state. Fault tolerance must be designed explicitly.

The key principle is failure isolation: a failure in one agent should not cascade to others. Each agent's execution is wrapped in error handling, and failures are reported to the orchestrator as structured error messages rather than propagated as exceptions:

class SafeAgent:
    def __init__(self, agent, max_retries: int = 3):
        self.agent = agent
        self.max_retries = max_retries

    def run(self, task: str) -> AgentMessage:
        for attempt in range(self.max_retries):
            try:
                result = self.agent.run(task)
                return AgentMessage(
                    sender=self.agent.name,
                    msg_type="result",
                    payload={"result": result}
                )
            except Exception as e:
                if attempt == self.max_retries - 1:
                    return AgentMessage(
                        sender=self.agent.name,
                        msg_type="error",
                        payload={"error": str(e),
                                 "attempts": attempt + 1}
                    )
                time.sleep(2 ** attempt)  # exponential backoff

The orchestrator then handles error messages: it can reassign the subtask to a different agent, fall back to a simpler approach, or escalate to a human reviewer.

Human-in-the-Loop Checkpoints

Production multi-agent systems should not be fully autonomous. Human oversight checkpoints at critical junctures prevent compounding errors and meet compliance requirements. Typical checkpoint triggers: high-stakes actions (sending external communications, making financial transactions, deleting data), low-confidence results (agent's self-assessed confidence below a threshold), conflicting agent outputs, and any irreversible action.

The checkpoint pattern interrupts the agent loop and waits for human approval before proceeding:

def run_with_checkpoint(orchestrator, task: str,
                         approval_fn) -> dict:
    plan = orchestrator.plan(task)

    for step in plan.steps:
        if step.requires_approval:
            decision = approval_fn(step)
            if decision != "approved":
                return {"status": "halted",
                        "reason": f"Step {step.id} rejected"}
        result = orchestrator.execute_step(step)
        plan.record_result(step.id, result)

    return orchestrator.synthesise(plan)

Framework Comparison: CrewAI vs AutoGen vs LangGraph

Three frameworks dominate multi-agent development in 2026, each with a distinct mental model:

  • CrewAI models agents as team members with roles, goals, and backstories. You define a crew of agents and a list of tasks, and CrewAI manages sequential or parallel execution. The mental model (agents as human team members with responsibilities) makes it the most accessible framework for developers new to multi-agent systems. Best for: structured workflows, content generation pipelines, research tasks. Limitation: less suited to highly dynamic or branching workflows where the task structure is unknown upfront.
  • AutoGen (Microsoft) supports conversational multi-agent systems where agents exchange messages in a configurable conversation flow. Strong for collaborative reasoning tasks (code review, debate, adversarial verification) where the back-and-forth between agents produces better results than a single agent. Best for: research, conversational collaboration, multi-turn problem solving. Limitation: message overhead and coordination complexity in large agent networks.
  • LangGraph represents agent workflows as directed graphs with nodes (agents or functions) and edges (transitions). Supports cycles, branching, and conditional routing — any topology you can express as a graph. This makes it the most flexible framework for complex, stateful workflows. LangGraph explicitly separates state management from agent logic, making it the most controllable option. Best for: production systems that need precise workflow control, complex branching, and explicit state management. Limitation: steeper learning curve, more boilerplate than CrewAI.

Real-World Use Cases for Multi-Agent Systems

Multi-agent architectures deliver the most value in tasks that naturally decompose into parallel specialised work:

  • Research pipelines — a planner agent decomposes a research question; parallel research agents query different sources (web, internal documents, databases); a reviewer agent validates and deduplicates findings; a writer agent produces the final report.
  • Code review — a security agent scans for vulnerabilities; a style agent checks coding standards; a logic agent reviews business logic correctness; a summary agent aggregates findings for the developer.
  • Report generation — a data agent pulls metrics from multiple systems; an analysis agent identifies trends; a visualisation agent generates chart descriptions; a writing agent produces the executive summary. A human checkpoint reviews before delivery.
  • Document intake and decision support — an extraction agent parses incoming documents; a classification agent categorises them; a risk-flagging agent identifies compliance issues; a routing agent sends them to the correct reviewer with a summary.

The architectural rule that makes all of these work: one agent, one responsibility. Each agent is small, focused, and testable in isolation. The orchestrator composes them into a capable system.