Multi-Agent Systems

What are multi-agent systems

A multi-agent system is more than “multiple LLM calls.” It’s multiple autonomous agents working toward a shared goal, each with their own role, tools, and decision-making. The agents must coordinate — who does what, when, with what constraints.

Real examples:

Code review: A reviewer agent examines changes, an author agent addresses feedback. They take turns, share context, and agree on when it’s done.
Game AI: Six players in a social deduction game. Each has private information, makes independent decisions, and communicates through shared channels.
Dev team: A frontend agent and backend agent work on different parts of a feature, coordinating through shared state and events. An infrastructure agent deploys when both are ready.

The coordination problem

The fundamental challenge: how do agents coordinate without wasting tokens?

Polling (the standard approach)

Every framework today — CrewAI, AutoGen, LangGraph, Swarm — uses the same pattern:

while True:
    events = get_events(since=last_seen)  # LLM token cost
    if no_events:
        continue                           # wasted tokens
    process(events)                        # actual work

Each poll cycle is a round-trip through the LLM. The agent spends tokens to ask “anything new?” and often the answer is “no.” Multiply by every agent, every iteration, every run.

Result: 15-25% of total tokens go to coordination, not work.

Push (the ACP approach)

ACP moves coordination out of the LLM context entirely. Agents don’t poll — they get told.

# Agent connects to ACP server
# Server pushes events via WebSocket
# Agent wakes only when something relevant happens
# Zero tokens spent on "anything new?"

The coordination server (incubator) handles event routing, state management, and phase transitions. The LLM only runs when there’s actual work to do.

Push vs pull

Aspect	Pull (polling)	Push (ACP)
Event checking	LLM generates API call	WebSocket push (free)
Message delivery	LLM polls inbox	Push notification (free)
Halt/pause	LLM discovers on next poll	Hook intercept (immediate)
Phase transitions	Agent discovers eventually	Push + prompt refresh
State management	In LLM context (compaction-lossy)	External server (full fidelity)
Token cost	15-25% overhead	~0%
Latency	Poll interval	Real-time

When to use multi-agent

Multi-agent makes sense when:

Multiple roles with different capabilities — a reviewer shouldn’t have write access, a deployer shouldn’t edit code
Parallelizable work — frontend and backend can proceed simultaneously
Different model requirements — cheap Haiku for CSS, expensive Opus for architecture decisions
Governance requirements — budget caps, approval gates, audit trails per agent
Social dynamics — debate, voting, adversarial review

When NOT to use multi-agent

Single-agent is better when:

One task, one agent — no coordination needed
Sequential work — each step depends on the previous, no parallelism benefit
Simple workflows — a chain of prompts with no branching or shared state
Token budget is tiny — coordination overhead matters less when the total is small

The test: if you can describe the workflow as “do A, then B, then C” with no branching and no shared state, use a single agent. If agents need to coordinate, claim resources, or react to each other’s work, use ACP.

Comparison with frameworks

Framework	Approach	Coordination	Token overhead
CrewAI	Python orchestrator, sequential/parallel tasks	Message passing through orchestrator	Medium (orchestrator runs through LLM)
AutoGen	Conversation-based, agents talk to each other	Chat messages (all in LLM context)	High (full conversation history)
LangGraph	Directed graph of LLM calls	Graph edges, conditional routing	Low (structured) but rigid
Swarm	Handoff-based, agents pass control	Function calls between agents	Medium
ACP	Protocol-driven, push-based events	External server, WebSocket, hooks	~0% (coordination is free)

The key difference: ACP protocols are declarative YAML, not imperative code. You define what should happen (roles, phases, rules), not how to route messages. The runtime handles coordination. The protocol survives a model swap.