Skip to content

LLM Providers

Honeybee supports multiple LLM providers. Mix providers within a single brood — use fast inference for simple tasks and capable models for complex reasoning.

ProviderAliasDefault ModelBest for
Cerebrasfastllama-3.3-70bFast inference, low latency
Groqllama-3.3-70bFast inference, OpenAI-compatible
Anthropicsmartclaude-sonnetComplex reasoning, tool use
OpenAIgpt-4oBroad ecosystem, vision
Ollamalocalllama3.3Local inference, no API costs

Provider keys are sourced from ~/.secrets/*.env when running wgl up:

~/.secrets/cerebras.env
CEREBRAS_API_KEY=csk-...
# ~/.secrets/groq.env
GROQ_API_KEY=gsk-...
# ~/.secrets/anthropic.env
ANTHROPIC_API_KEY=sk-ant-...
# ~/.secrets/openai.env
OPENAI_API_KEY=sk-...
Terminal window
# Set a user-level key (used by all hives)
wgl secret set cerebras
# Set a hive-specific key
wgl secret set cerebras --hive my-project
# List stored keys
wgl secret list
# Full format
provider: cerebras/llama-3.3-70b
# Alias
provider: fast
hives:
main:
agents:
- role: architect
provider: anthropic/claude-opus # Complex decisions
- role: developer
provider: cerebras/llama-3.3-70b # Fast execution
- role: reviewer
provider: fast # Alias for cerebras

Fastest inference available. TCP warming reduces time-to-first-token.

provider: cerebras/llama-3.3-70b
# or
provider: cerebras/llama-3.1-8b
  • TCP warming enabled by default in incubator (long-running)
  • Disabled in CF Workers (warmTCPConnection: false)
  • 2000+ tokens/second for streaming

Fast inference with OpenAI-compatible API.

provider: groq/llama-3.3-70b
  • OpenAI SDK fork (identical API surface)
  • CF Workers: requires nodejs_compat flag

Most capable models. Best for Queens and complex reasoning.

provider: anthropic/claude-opus
provider: anthropic/claude-sonnet
provider: anthropic/claude-haiku
  • Native tool use support
  • Vision capabilities
  • Input caching for repeated system prompts

Broad model selection and ecosystem.

provider: openai/gpt-4o
provider: openai/gpt-4o-mini

Run models locally with zero API costs. Requires Ollama running on the network.

provider: ollama/llama3.3
provider: ollama/qwen2.5:72b
provider: ollama/deepseek-r1:70b

Default endpoint: http://localhost:11434. Override with OLLAMA_HOST environment variable.

# Cheap: drone + fast provider for review/voting
- role: reviewer
type: drone
provider: fast
# Medium: worker + mid-tier for implementation
- role: developer
type: worker
provider: cerebras/llama-3.3-70b
# Premium: claude type + opus for architecture
- role: architect
type: claude
provider: anthropic/claude-opus
# Free: local ollama for development/testing
- role: tester
type: worker
provider: local

Use the cheapest provider that meets the task requirements. Simple CSS → Haiku. API design → Sonnet. Architecture → Opus.