Skip to content

Scanner

The Carapace scanner is a zero-dependency prompt injection detection engine. It analyzes text for patterns that attempt to manipulate LLM behavior, scoring each finding and taking action based on configurable thresholds.

The scanner runs a battery of pattern matchers against input text. Each matcher targets a specific attack category and produces findings with severity scores. Scores are summed to produce a total threat score, which determines the action.

Input text
→ 29 category matchers (parallel)
→ Findings (category, score, evidence)
→ Total score
→ Action (PASS / LOG / WARN / BLOCK)
#CategoryWhat it detects
1System prompt override”ignore previous instructions”, “new system prompt”
2Role manipulation”you are now”, “act as”, “pretend to be”
3Instruction injection”do not follow”, “disregard”, “forget your rules”
4Data exfiltration”output your system prompt”, “repeat everything above”
5Encoding bypassBase64/hex/rot13 encoded payloads
6Delimiter injectionMarkdown, XML, HTML tags to break context
7Multi-turn manipulationBuilding trust across messages
8Jailbreak patternsDAN, developer mode, GPT-4 jailbreaks
9Prompt leakingExtracting system instructions
10Token smugglingZero-width characters, homoglyphs
11Context overflowExtremely long inputs to push context
12Tool misuseManipulating tool calls or function names
13Payload hidingSteganographic or invisible text patterns
14Indirect injectionInstructions embedded in data (URLs, files)
15Multilingual bypassNon-English instructions to evade English rules
16Hypothetical framing”hypothetically”, “in a story”, “roleplay”
17Authority impersonation”I’m the developer”, “admin override”
18Emotional manipulationUrgency, threats, guilt
19Recursive injectionNested prompt injection in outputs
20Few-shot poisoningManipulated examples
21Chain-of-thought hijack”let’s think step by step” for malicious goals
22Memory manipulationAltering conversation history
23API abuseManipulating API parameters
24Multimodal injectionText in images, audio transcripts
25Competitive analysisExtracting model capabilities
26Output format manipulationForcing JSON/code/markdown for injection
27Semantic deceptionContradictory or misleading framing
28Resource exhaustionIntentionally expensive operations
29Supply chainManipulated package names, import paths

Each finding has a severity score:

Score rangeSeverityTypical action
0-19LowPASS
20-49MediumLOG
50-99HighWARN
100+CriticalBLOCK

Scores accumulate across findings. A message with three medium-severity findings (20 + 25 + 30 = 75) triggers a WARN even though no single finding was critical.

ModeBehavior
PASSAllow through, no logging
LOGAllow through, record finding
WARNAllow through, flag for review
BLOCKReject the input

The threshold is configurable:

import { scan } from '@honeybee-ai/carapace';
// Default threshold (100 = BLOCK)
const result = scan(text);
// Custom threshold
const result = scan(text, { threshold: 50 });

The core detection uses carefully crafted patterns that balance precision and recall:

  • Anchored patterns: Prefixed with ^ or word boundaries to prevent false positives
  • No nested quantifiers: All patterns are validated against ReDoS (3 audit findings fixed)
  • Case-insensitive: Attacks in mixed case are caught
  • Unicode-aware: Homoglyph detection, zero-width character stripping

In the incubator, Carapace scans at multiple points:

PointWhat’s scannedAction on BLOCK
Write guardState values, event data, messagesReject the write
Read guardState values returned to agentsStrip or flag
Snapshot guardFull state snapshotsSanitize findings
Protocol guardRole descriptions in ACP specsReject spec load
MCP proxyTool call arguments and resultsBlock the call

Beyond detection, Carapace can strip injection patterns while preserving content:

import { sanitize } from '@honeybee-ai/carapace';
const result = sanitize(userInput);
// result.sanitized — cleaned text
// result.removed — what was stripped
// result.modified — boolean

The scanner is designed for inline use (not async batch processing):

  • Synchronous: scan() and isSafe() are synchronous calls
  • Sub-millisecond: Typical scan time is < 1ms for messages under 10KB
  • No I/O: Zero network calls, zero file reads — pure computation
  • No dependencies: Nothing to download, nothing to audit
  • Not an LLM: The scanner uses pattern matching, not language understanding. Sufficiently novel attacks may evade detection.
  • False positives: Legitimate technical discussions about prompt injection may trigger findings. Use LOG mode for educational contexts.
  • Post-delivery scanning: Streaming responses are scanned after delivery (buffer-all-before-forward mode available but adds latency).