Engine
The compression, tagging, and retrieval machinery.
Compactor
Two-tier compaction converts raw conversation turns into compressed segments:
Summary compaction fires when the context window fill level crosses the soft threshold (default 70%). It selects uncompacted turns, groups them by tag overlap, and calls the summarization LLM to produce condensed segment summaries.
Deep compaction fires at the hard threshold (default 85%). It re-compresses existing summaries into even shorter forms.
Compaction is incremental. A watermark tracks which turns have been processed. Protected recent turns (default 6) are never compacted. Runs in the background thread, never blocking the response path.
compaction:
soft_threshold: 0.70
hard_threshold: 0.85
protected_recent_turns: 6
min_summary_tokens: 100
max_summary_tokens: 500Tagging Pipeline
Every turn is tagged twice, by two independent systems running in parallel:
Inbound Embedding Tagger: Runs on the user message before the LLM responds. Uses a local embedding model to compute vector similarity against existing tags. Fast, deterministic, cannot hallucinate novel tags.
LLM Response Tagger: Runs after on_turn_complete with the full turn. Calls the configured tagger LLM to assign semantic tags. Produces richer vocabulary and catches nuances the embedding tagger misses.
The two tag sets are merged. The union provides both retrieval reliability (embeddings) and vocabulary richness (LLM).
Context Bleed Gate
When the conversation shifts topics abruptly, the inbound tagger may carry forward tags from the previous topic. The context bleed gate detects sharp topic shifts by measuring overlap between current inbound tags and recent tag history. When overlap drops below a threshold, it suppresses carryover tags.
Tag Splitting and Aliases
When a tag grows too large (too many segments), the engine splits it into subtags and registers aliases. Queries against the original tag name still find segments under the split subtags.
Retrieval
3-signal Reciprocal Rank Fusion (RRF):
- Signal 1: IDF Tag Overlap – Compares query tags against segment tag sets, weighted by inverse document frequency.
- Signal 2: BM25 Keyword – Standard BM25 scoring of query text against segment text.
- Signal 3: Embedding Cosine – Vector similarity between query and segment embeddings.
Post-fusion: Gravity dampening penalizes distant segments. Hub dampening penalizes segments that match too many queries.
Tags from the most recent N turns are skipped during retrieval (their content is already in the raw history).
retrieval:
active_tag_lookback: 4
strategy_config:
default:
max_results: 10
max_budget_fraction: 0.25
broad:
max_results: 15
max_budget_fraction: 0.35
temporal:
max_results: 8
max_budget_fraction: 0.20Assembly
The assembler constructs the <virtual-context> block in two passes:
Priority Pass (Tag Rules): Must-include content defined by tag rules is included first.
Fill Pass (Greedy Set Cover): Remaining budget is filled by retrieval results in score order until the budget is exhausted.
The assembly budget is a fraction of the total context window (default 25%). After compaction, the assembler can inject a topic list hint: a brief enumeration of all available tags with segment counts.
Token Counter
| Mode | Method | Speed | Accuracy |
|---|---|---|---|
| anthropic | Anthropic’s tokenizer library | Slow | Exact for Claude models |
| tiktoken | OpenAI’s tiktoken library | Fast | Exact for GPT models, close for others |
| estimate | len(text) / 4 | Instant | ~10-20% variance |
Image-aware: for base64 images, uses dimension-based token costing rather than counting the base64 string characters. Fallback chain: anthropic -> tiktoken -> estimate.
Fact Extraction
Facts are structured triples: subject | verb | object with metadata (status, date, location, type).
Supersession
“User moved from NYC to LA” invalidates “User lives in NYC.” Supersession runs during the compaction LLM pass.
Querying
vc_query_facts(subject="user", verb="visited", status="completed")Verb matching includes morphological expansion: querying “led” also matches “leads”, “leading”.
Fact Graph
Six relationship types: SUPERSEDES, CAUSED_BY, PART_OF, CONTRADICTS, SAME_AS, RELATED_TO. Detected during the supersession LLM pass (zero additional calls). 1-hop traversal on query results.
Chain Collapse
Before (3 messages, ~18K tokens):
assistant: [tool_use: Read file.py]
user: [tool_result: <full 500-line file>]
assistant: "Bug on line 42..."
After (2 messages, ~200 tokens):
user: [compacted turn: Read(file.py)]
assistant: "Bug on line 42..."Lossless compression: nothing is discarded, just moved to cheaper storage with a pointer left in the conversation. The vc_restore_tool(ref) tool allows the model to recover any collapsed chain at full fidelity.
Media Compression
Base64 images are decoded, resized to reduce dimensions, and the compressed version replaces the original. A 391KB screenshot becomes ~40KB, cutting payload size by ~90%. Originals stored to disk for recovery.
Tool Loop
| Tool | Purpose |
|---|---|
vc_expand_topic | Load full text for a topic tag (with optional collapse of other tags) |
vc_find_quote | Full-text search across all stored conversation text |
vc_query_facts | Structured fact lookup with filters |
vc_remember_when | Time-scoped recall (date ranges + query) |
vc_recall_all | Load all topic summaries at once |
vc_restore_tool | Recover a collapsed tool chain at full fidelity |
Anti-repetition tracking suppresses duplicate retrievals. Empty streak detection suggests alternative query strategies.