Honeycomb
Honeycomb is a shared execution pool. Instead of one container per agent (80% idle), Honeycomb runs N always-warm workers that pick jobs from a queue. Same infrastructure for execution, training, builds, and scans.
The problem with per-agent containers
Section titled “The problem with per-agent containers”Traditional approach:
- Agent A needs to run
npm test→ spin up container, run, wait, tear down - Agent B needs to run
cargo build→ spin up container, run, wait, tear down - Agent C is thinking (no shell needed) → container sits idle, burning money
Each container: ~$0.05/hr. Most of that time it’s idle — waiting for the agent to think, waiting for LLM responses, doing nothing.
Average utilization: ~20%. You’re paying for 5x what you use.
How Honeycomb works
Section titled “How Honeycomb works”Job Queue ├── Worker A [node22, python3] ← warm, picks matching jobs ├── Worker B [node22, rust] ← warm ├── Worker C [gpu, python3] ← warm (training jobs) └── Worker D [node22, large-repo] ← warm- Agent needs to run a shell command
- Command + filesystem overlay queued to Honeycomb
- Worker with matching tags picks the job
- Apply overlay to temp directory
- Run command, stream results back to agent
- Wipe workspace, pick next job
Average utilization: ~90%. Workers are always busy.
Same cells, different honey
Section titled “Same cells, different honey”The pool handles any job type with the right payload:
| Job type | Payload | Example |
|---|---|---|
| Execution | overlay + commands → stdout/stderr | Agent runs npm test |
| Training | job.yaml + data → fine-tuned model | LoRA fine-tune on user’s codebase |
| Build | source + Dockerfile → image | CI pipeline |
| Scan | codebase snapshot → security report | Deep carapace analysis |
Tag-based routing ensures jobs land on workers with the right capabilities (node22, python3, gpu, large-memory).
Economics
Section titled “Economics”Traditional: ~$0.05/hr per agent (full container, 80% idle)Honeycomb: ~$0.001/hr + ~$0.001/jobSavings: ~90%The cost drop comes from three factors:
- No idle time: Workers always have work. No paying for containers that wait.
- Lightweight isolation:
unsharefor mount/PID/network namespace. No container overhead. - Shared base images: Workers pre-loaded with common runtimes. No cold starts.
Overlay model
Section titled “Overlay model”Agents work on virtual filesystems (memfs or FUSE). When they need real execution, their filesystem changes are packaged as an overlay:
Agent's virtual FS changes: + src/api.ts (new file) ~ package.json (modified) - old-config.yaml (deleted)
→ Overlay sent to Honeycomb worker→ Worker applies overlay to tmpdir→ Runs: npm test→ Streams stdout/stderr back→ rm -rf tmpdir→ Ready for next jobThe agent never knows it’s not running locally. The overlay model is transparent.
Scaling
Section titled “Scaling”- Queue depth > threshold → add workers
- Idle workers > timeout → remove workers
- GPU jobs in queue → wake GPU workers
- Auto-scale based on demand. No manual capacity planning.
Workers are stateless — any worker can handle any job with matching tags. Adding capacity is instant.