LLM Providers
Honeybee supports multiple LLM providers. Mix providers within a single brood — use fast inference for simple tasks and capable models for complex reasoning.
Supported providers
Section titled “Supported providers”| Provider | Alias | Default Model | Best for |
|---|---|---|---|
| Cerebras | fast | llama-3.3-70b | Fast inference, low latency |
| Groq | — | llama-3.3-70b | Fast inference, OpenAI-compatible |
| Anthropic | smart | claude-sonnet | Complex reasoning, tool use |
| OpenAI | — | gpt-4o | Broad ecosystem, vision |
| Ollama | local | llama3.3 | Local inference, no API costs |
Setting provider keys
Section titled “Setting provider keys”Local development
Section titled “Local development”Provider keys are sourced from ~/.secrets/*.env when running wgl up:
CEREBRAS_API_KEY=csk-...
# ~/.secrets/groq.envGROQ_API_KEY=gsk-...
# ~/.secrets/anthropic.envANTHROPIC_API_KEY=sk-ant-...
# ~/.secrets/openai.envOPENAI_API_KEY=sk-...Cloud (Colony)
Section titled “Cloud (Colony)”# Set a user-level key (used by all hives)wgl secret set cerebras
# Set a hive-specific keywgl secret set cerebras --hive my-project
# List stored keyswgl secret listConfiguration in brood.yaml
Section titled “Configuration in brood.yaml”Default provider
Section titled “Default provider”# Full formatprovider: cerebras/llama-3.3-70b
# Aliasprovider: fastPer-agent override
Section titled “Per-agent override”hives: main: agents: - role: architect provider: anthropic/claude-opus # Complex decisions - role: developer provider: cerebras/llama-3.3-70b # Fast execution - role: reviewer provider: fast # Alias for cerebrasProvider details
Section titled “Provider details”Cerebras
Section titled “Cerebras”Fastest inference available. TCP warming reduces time-to-first-token.
provider: cerebras/llama-3.3-70b# orprovider: cerebras/llama-3.1-8b- TCP warming enabled by default in incubator (long-running)
- Disabled in CF Workers (
warmTCPConnection: false) - 2000+ tokens/second for streaming
Fast inference with OpenAI-compatible API.
provider: groq/llama-3.3-70b- OpenAI SDK fork (identical API surface)
- CF Workers: requires
nodejs_compatflag
Anthropic
Section titled “Anthropic”Most capable models. Best for Queens and complex reasoning.
provider: anthropic/claude-opusprovider: anthropic/claude-sonnetprovider: anthropic/claude-haiku- Native tool use support
- Vision capabilities
- Input caching for repeated system prompts
OpenAI
Section titled “OpenAI”Broad model selection and ecosystem.
provider: openai/gpt-4oprovider: openai/gpt-4o-miniOllama (local)
Section titled “Ollama (local)”Run models locally with zero API costs. Requires Ollama running on the network.
provider: ollama/llama3.3provider: ollama/qwen2.5:72bprovider: ollama/deepseek-r1:70bDefault endpoint: http://localhost:11434. Override with OLLAMA_HOST environment variable.
Cost optimization patterns
Section titled “Cost optimization patterns”# Cheap: drone + fast provider for review/voting- role: reviewer type: drone provider: fast
# Medium: worker + mid-tier for implementation- role: developer type: worker provider: cerebras/llama-3.3-70b
# Premium: claude type + opus for architecture- role: architect type: claude provider: anthropic/claude-opus
# Free: local ollama for development/testing- role: tester type: worker provider: localUse the cheapest provider that meets the task requirements. Simple CSS → Haiku. API design → Sonnet. Architecture → Opus.