Design Decisions

Why virtual-context is built the way it is.

Compression Improves Reasoning

The central thesis: compressed, structured context produces better model reasoning than raw conversation dumps. When a model receives 60K tokens of curated summaries organized by topic, it performs better than when it receives 60K tokens of raw chat history that includes noise, repetition, and irrelevant tangents.

Conversation text has extremely low information density. Most turns contain phatic exchanges, restated context, debugging dead ends, and scaffolding that served a purpose in the moment but adds noise later. Compaction strips this while preserving the semantic core.

The benchmarks confirm it: virtual-context achieves 95% accuracy on LocOMo memory questions vs. 33% for full-history baselines.

Two-Tagger Rationale

The embedding tagger is safe. It runs locally, costs nothing per call, executes in milliseconds, and produces deterministic results. If the LLM tagger fails or hallucinates tags, the embedding tagger still anchors retrieval.

The LLM tagger is rich. It understands context, catches implicit topics (“I’m worried about the deadline” → project-timeline), and generates natural vocabulary.

Running both in parallel costs one Haiku call per turn (~0.01 cents) but provides both retrieval reliability and vocabulary richness.

Sync-First Processing

Inbound must be synchronous. The model needs context before it can respond. Tagging, retrieval, and assembly must complete before the request is forwarded.

Completion can be asynchronous. After the response streams back, the background thread handles response tagging, index updates, compaction, and fact extraction. The user is already reading the response.

Each new request waits for the previous turn’s completion to finish. In practice, completion takes 200-500ms, and users take seconds between turns.

Tag Preservation Through Compaction

When segments are compacted, their tag assignments are preserved. The summary inherits the tags of the original turns. When segments are deep-compacted, tags are preserved again. The tag set is monotonically stable across compaction tiers.

This is why tag quality matters at assignment time: tags are the permanent index.

Chain Collapse Over Truncation

Many systems handle tool-heavy conversations by truncating old tool results. This is lossy and unpredictable.

Chain collapse replaces tool exchanges with compact stubs that include a restore reference. The model can see that information exists and recover it on demand via vc_restore_tool. Nothing is lost; it’s just paged out.

This mirrors virtual memory: pages are swapped to disk and faulted back in on access.

No SDK Dependencies

Virtual-context operates as a proxy, not a library. Point your API calls at localhost:8100 instead of api.anthropic.com, and everything works. No client changes, no framework lock-in.

Format Detection Over Configuration

The proxy auto-detects whether a request uses the Anthropic, OpenAI Chat, OpenAI Responses, or Gemini API format. Detection uses structural signals (field names, URL paths, model name prefixes) rather than content heuristics.

Greedy Set Cover for Assembly

Segments sorted by retrieval score, added in order until budget is full. Optimal in practice (within a constant factor of the theoretical best) and fast (single pass, O(n)).

Demand-Paged Context

OS Virtual Memory                    virtual-context
-----------------                    ---------------
Physical RAM            =  Context window
Disk / swap             =  SQLite (segments, facts, summaries)
Page tables             =  TurnTagIndex (per-turn topic tracking)
Page faults             =  vc_expand_topic (demand paging)
Page eviction (LRU)     =  Compaction (topic-aware eviction)
Working set             =  Active paging depths per tag
Address space           =  Full conversation history (unbounded)
Memory protection       =  Bleed gating (topic-shift isolation)

The model sees recent turns at full fidelity, retrieved summaries for relevant topics, topic hints listing what else is available, and tool definitions for paging in more. The engine manages the page table, page frames, and eviction.

Fact Supersession Over Versioning

Facts use supersession (new fact invalidates old) rather than versioning (keep all versions). “User lives in LA” replaces “User lives in NYC.”

For cases where history matters, the fact’s when field and underlying conversation segments preserve the timeline. Supersession cleans up the active fact set, not the historical record.