Skip to content

Multi-Agent Systems

A multi-agent system is more than “multiple LLM calls.” It’s multiple autonomous agents working toward a shared goal, each with their own role, tools, and decision-making. The agents must coordinate — who does what, when, with what constraints.

Real examples:

  • Code review: A reviewer agent examines changes, an author agent addresses feedback. They take turns, share context, and agree on when it’s done.
  • Game AI: Six players in a social deduction game. Each has private information, makes independent decisions, and communicates through shared channels.
  • Dev team: A frontend agent and backend agent work on different parts of a feature, coordinating through shared state and events. An infrastructure agent deploys when both are ready.

The fundamental challenge: how do agents coordinate without wasting tokens?

Every framework today — CrewAI, AutoGen, LangGraph, Swarm — uses the same pattern:

while True:
events = get_events(since=last_seen) # LLM token cost
if no_events:
continue # wasted tokens
process(events) # actual work

Each poll cycle is a round-trip through the LLM. The agent spends tokens to ask “anything new?” and often the answer is “no.” Multiply by every agent, every iteration, every run.

Result: 15-25% of total tokens go to coordination, not work.

ACP moves coordination out of the LLM context entirely. Agents don’t poll — they get told.

# Agent connects to ACP server
# Server pushes events via WebSocket
# Agent wakes only when something relevant happens
# Zero tokens spent on "anything new?"

The coordination server (incubator) handles event routing, state management, and phase transitions. The LLM only runs when there’s actual work to do.

AspectPull (polling)Push (ACP)
Event checkingLLM generates API callWebSocket push (free)
Message deliveryLLM polls inboxPush notification (free)
Halt/pauseLLM discovers on next pollHook intercept (immediate)
Phase transitionsAgent discovers eventuallyPush + prompt refresh
State managementIn LLM context (compaction-lossy)External server (full fidelity)
Token cost15-25% overhead~0%
LatencyPoll intervalReal-time

Multi-agent makes sense when:

  • Multiple roles with different capabilities — a reviewer shouldn’t have write access, a deployer shouldn’t edit code
  • Parallelizable work — frontend and backend can proceed simultaneously
  • Different model requirements — cheap Haiku for CSS, expensive Opus for architecture decisions
  • Governance requirements — budget caps, approval gates, audit trails per agent
  • Social dynamics — debate, voting, adversarial review

Single-agent is better when:

  • One task, one agent — no coordination needed
  • Sequential work — each step depends on the previous, no parallelism benefit
  • Simple workflows — a chain of prompts with no branching or shared state
  • Token budget is tiny — coordination overhead matters less when the total is small

The test: if you can describe the workflow as “do A, then B, then C” with no branching and no shared state, use a single agent. If agents need to coordinate, claim resources, or react to each other’s work, use ACP.

FrameworkApproachCoordinationToken overhead
CrewAIPython orchestrator, sequential/parallel tasksMessage passing through orchestratorMedium (orchestrator runs through LLM)
AutoGenConversation-based, agents talk to each otherChat messages (all in LLM context)High (full conversation history)
LangGraphDirected graph of LLM callsGraph edges, conditional routingLow (structured) but rigid
SwarmHandoff-based, agents pass controlFunction calls between agentsMedium
ACPProtocol-driven, push-based eventsExternal server, WebSocket, hooks~0% (coordination is free)

The key difference: ACP protocols are declarative YAML, not imperative code. You define what should happen (roles, phases, rules), not how to route messages. The runtime handles coordination. The protocol survives a model swap.