Engineering· 9 min read

Multi-Agent Systems: Architecture Patterns That Work

After building multi-agent systems across logistics, finance, and operations, these are the architecture patterns that consistently work in production — and the ones that don't.

Why Single Agents Hit a Wall

A single agent handling a complex end-to-end workflow sounds elegant. In practice, it creates systems that are brittle, expensive, and hard to debug. Three things kill single-agent designs at scale: context limits, task specialization, and reliability arithmetic.

Context limits are the most obvious. Even with Claude's 200K context window, a long-running workflow accumulates conversation history, tool outputs, and intermediate results fast. An agent that starts sharp becomes progressively less reliable as its context fills. You end up with a system that works perfectly for the first 20 minutes of a workflow and degrades steadily after that.

Task specialization matters because prompts that try to do everything do nothing particularly well. An agent tasked with "analyze a logistics shipment, assess delay risk, draft customer communication, and update the TMS" is trying to hold four distinct mental models simultaneously. Split those into four specialized agents and each one can be prompted, tested, and optimized independently.

The Orchestrator-Worker Pattern

The most reliable multi-agent architecture we've deployed is the orchestrator-worker pattern. One orchestrator agent is responsible for task decomposition, sequencing, and result assembly. Worker agents each handle a specific, well-scoped task. The orchestrator never executes domain work — it only coordinates. Workers never coordinate — they only execute.

This separation of concerns is not just a design preference — it has concrete engineering benefits. The orchestrator prompt can be small and focused entirely on planning logic. Each worker prompt can be deeply specialized with domain-specific instructions, examples, and constraints. You can swap out a worker without touching the orchestrator. You can test workers in isolation without spinning up the full pipeline.

The orchestrator communicates with workers through a structured handoff protocol. We use a simple JSON envelope: task type, input data, required output schema, constraints, and a correlation ID for tracing. The worker returns the result in a matching envelope. This formalism pays dividends in debugging — when something goes wrong, you can replay any individual handoff in isolation.

Parallel vs. Sequential Execution

Not all tasks in a workflow depend on each other. Identifying and exploiting parallelism is one of the highest-leverage optimizations in multi-agent systems. In a document processing workflow, extraction and classification are often independent and can run concurrently. In a logistics assessment, route optimization and compliance checking don't need each other's outputs and can run in parallel.

The practical rule: map your workflow as a directed acyclic graph (DAG). Nodes with no upstream dependencies can start immediately. Nodes with multiple upstream dependencies wait for the slowest upstream node. The orchestrator manages this graph — it knows which tasks are ready to execute and dispatches them when their dependencies are met.

Sequential when you need it

Some tasks genuinely depend on previous outputs — risk assessment needs the extracted data, customer communication needs the risk assessment. Don't force parallelism where dependencies are real. The goal is eliminating unnecessary sequential execution, not eliminating all of it.

State Management: Shared Memory vs. Message Passing

How agents share state is one of the most consequential architectural decisions in a multi-agent system. Two approaches: shared memory (all agents read and write to a common store) and message passing (agents communicate by sending structured messages through the orchestrator).

We use message passing for almost everything. Shared mutable state in a distributed system creates race conditions, debugging nightmares, and coupling between agents that is very hard to unwind later. With message passing, each agent receives exactly the context it needs for its task, nothing more. The full workflow state lives in the orchestrator, which is the only component that needs the full picture.

The exception is read-only shared state: configuration, lookup tables, reference data. This is fine to share because it's immutable. What you want to avoid is agents that write to shared state that other agents subsequently read — that's where the subtle bugs live.

Error Propagation and the Supervisor Pattern

When a worker agent fails, that failure should not automatically propagate to the entire workflow. The orchestrator needs a clear error handling policy for each task: is this failure fatal to the workflow, or can the workflow continue with a degraded result? Can the task be retried with a different prompt? Is there a fallback path?

For quality control, we add a supervisor layer to workflows where output quality matters. The supervisor is a separate agent that reviews worker outputs against defined quality criteria before they are passed to the next stage. It can approve, request a revision, or escalate to human review. The supervisor is not in the critical path for most requests — it samples and spot-checks, only blocking when it detects a specific quality issue.

Structuring agent prompts for clear handoffs in Claude specifically: start each worker system prompt with an explicit statement of the agent's role and scope. Tell it what it is responsible for and what it is not responsible for. This reduces the common failure mode of a worker trying to do more than its task, which causes inconsistent outputs and makes the orchestrator's job harder.

Real Example: Logistics Dispatch System

One of our production deployments is a logistics operations system that handles inbound delay notifications and automates the response workflow. The architecture has four agents: a data extractor that pulls structured information from the notification, a routing analysis agent that assesses impact across connected shipments, a communication drafting agent that writes customer-facing updates, and a TMS update agent that makes the appropriate system changes.

The orchestrator receives the raw notification and coordinates a workflow where extraction runs first, then routing analysis and TMS staging run in parallel (they both depend on extraction but not on each other), and communication drafting runs last (it depends on the routing analysis to know what to tell the customer). Total workflow time: under 90 seconds for a task that previously took a dispatcher 15–20 minutes.

Each agent is independently testable, independently deployable, and independently monitorable. When we needed to improve the customer communication tone, we updated and tested the drafting agent in isolation without touching anything else. That kind of modularity is only possible because the architecture keeps concerns cleanly separated.

Anti-Patterns to Avoid

→Over-decomposition: creating ten agents for a workflow that two would handle better. Each agent boundary adds latency, adds failure points, and adds debugging complexity. Decompose where there are real specialization or context benefits, not for its own sake.
→Circular dependencies: Agent A needs Agent B's output before it can complete, and Agent B needs Agent A's output. This creates deadlocks. Map your dependency graph before building and verify it's a DAG.
→Agents that know too much: passing the full workflow context to every agent when each only needs a subset. This burns tokens, increases costs, and can cause agents to use context they weren't supposed to see.
→No correlation IDs: in a distributed agent system, you need a way to trace a single workflow execution across all agents and all log lines. Implement correlation IDs from day one — retrofitting them is painful.

Monitoring Multi-Agent Systems

Single-agent observability is straightforward. Multi-agent observability requires distributed tracing. You need to be able to reconstruct the full execution of any workflow: which agents ran, in what order, what each received and returned, how long each took, and where failures occurred.

We use OpenTelemetry with a span per agent invocation, all linked under a root span for the workflow. This gives us waterfall views in any compatible backend (we use Honeycomb) that make it immediately obvious where latency is accumulating or where failures are occurring. The investment in tracing infrastructure pays back within the first major incident.

Engineering

How We Deploy Claude in Production Environments

Prompt management, error handling, observability, and cost control.

Engineering

Building AI Agents That Don't Break

The failure modes we've seen across 50+ deployments and how to prevent them.

Want to talk through your project?

We're always happy to discuss real problems. No sales pitch.

Book a Discovery Call