Engine

The compression, tagging, and retrieval machinery.

Compactor

Two-tier compaction converts raw conversation turns into compressed segments:

Summary compaction fires when the context window fill level crosses the soft threshold (default 70%). It selects uncompacted turns, groups them by tag overlap, and calls the summarization LLM to produce condensed segment summaries.

Deep compaction fires at the hard threshold (default 85%). It re-compresses existing summaries into even shorter forms.

Compaction is incremental. A watermark tracks which turns have been processed. Protected recent turns (default 6) are never compacted. Runs in the background thread, never blocking the response path.

compaction:
  soft_threshold: 0.70
  hard_threshold: 0.85
  protected_recent_turns: 6
  min_summary_tokens: 100
  max_summary_tokens: 500

Tagging Pipeline

Every turn is tagged twice, by two independent systems running in parallel:

Inbound Embedding Tagger: Runs on the user message before the LLM responds. Uses a local embedding model to compute vector similarity against existing tags. Fast, deterministic, cannot hallucinate novel tags.

LLM Response Tagger: Runs after on_turn_complete with the full turn. Calls the configured tagger LLM to assign semantic tags. Produces richer vocabulary and catches nuances the embedding tagger misses.

The two tag sets are merged. The union provides both retrieval reliability (embeddings) and vocabulary richness (LLM).

Context Bleed Gate

When the conversation shifts topics abruptly, the inbound tagger may carry forward tags from the previous topic. The context bleed gate detects sharp topic shifts by measuring overlap between current inbound tags and recent tag history. When overlap drops below a threshold, it suppresses carryover tags.

Tag Splitting and Aliases

When a tag grows too large (too many segments), the engine splits it into subtags and registers aliases. Queries against the original tag name still find segments under the split subtags.

Retrieval

3-signal Reciprocal Rank Fusion (RRF):

  • Signal 1: IDF Tag Overlap – Compares query tags against segment tag sets, weighted by inverse document frequency.
  • Signal 2: BM25 Keyword – Standard BM25 scoring of query text against segment text.
  • Signal 3: Embedding Cosine – Vector similarity between query and segment embeddings.

Post-fusion: Gravity dampening penalizes distant segments. Hub dampening penalizes segments that match too many queries.

Tags from the most recent N turns are skipped during retrieval (their content is already in the raw history).

retrieval:
  active_tag_lookback: 4
  strategy_config:
    default:
      max_results: 10
      max_budget_fraction: 0.25
    broad:
      max_results: 15
      max_budget_fraction: 0.35
    temporal:
      max_results: 8
      max_budget_fraction: 0.20

Assembly

The assembler constructs the <virtual-context> block in two passes:

Priority Pass (Tag Rules): Must-include content defined by tag rules is included first.

Fill Pass (Greedy Set Cover): Remaining budget is filled by retrieval results in score order until the budget is exhausted.

The assembly budget is a fraction of the total context window (default 25%). After compaction, the assembler can inject a topic list hint: a brief enumeration of all available tags with segment counts.

Token Counter

ModeMethodSpeedAccuracy
anthropicAnthropic’s tokenizer librarySlowExact for Claude models
tiktokenOpenAI’s tiktoken libraryFastExact for GPT models, close for others
estimatelen(text) / 4Instant~10-20% variance

Image-aware: for base64 images, uses dimension-based token costing rather than counting the base64 string characters. Fallback chain: anthropic -> tiktoken -> estimate.

Fact Extraction

Facts are structured triples: subject | verb | object with metadata (status, date, location, type).

Supersession

“User moved from NYC to LA” invalidates “User lives in NYC.” Supersession runs during the compaction LLM pass.

Querying

vc_query_facts(subject="user", verb="visited", status="completed")

Verb matching includes morphological expansion: querying “led” also matches “leads”, “leading”.

Fact Graph

Six relationship types: SUPERSEDES, CAUSED_BY, PART_OF, CONTRADICTS, SAME_AS, RELATED_TO. Detected during the supersession LLM pass (zero additional calls). 1-hop traversal on query results.

Chain Collapse

Before (3 messages, ~18K tokens):
  assistant: [tool_use: Read file.py]
  user:      [tool_result: <full 500-line file>]
  assistant: "Bug on line 42..."

After (2 messages, ~200 tokens):
  user:      [compacted turn: Read(file.py)]
  assistant: "Bug on line 42..."

Lossless compression: nothing is discarded, just moved to cheaper storage with a pointer left in the conversation. The vc_restore_tool(ref) tool allows the model to recover any collapsed chain at full fidelity.

Media Compression

Base64 images are decoded, resized to reduce dimensions, and the compressed version replaces the original. A 391KB screenshot becomes ~40KB, cutting payload size by ~90%. Originals stored to disk for recovery.

Tool Loop

ToolPurpose
vc_expand_topicLoad full text for a topic tag (with optional collapse of other tags)
vc_find_quoteFull-text search across all stored conversation text
vc_query_factsStructured fact lookup with filters
vc_remember_whenTime-scoped recall (date ranges + query)
vc_recall_allLoad all topic summaries at once
vc_restore_toolRecover a collapsed tool chain at full fidelity

Anti-repetition tracking suppresses duplicate retrievals. Empty streak detection suggests alternative query strategies.