Architecture
How virtual-context works under the hood.
Pipeline
Every request flows through a fixed sequence:
Client request
|
v
Format detection (Anthropic / OpenAI Chat / OpenAI Responses / Gemini)
|
v
Envelope stripping (OpenClaw channel metadata removed)
|
v
Wait for previous on_turn_complete
|
v
History ingestion (first request only: bootstrap TurnTagIndex)
|
v
on_message_inbound
|-- Inbound tagging (embedding tagger + LLM tagger in parallel)
|-- Retrieval (3-signal RRF: IDF tag overlap + BM25 keyword + embedding cosine)
|-- Assembly (budget-aware context block construction)
|
v
Inject <virtual-context> block into system prompt
|
v
Forward to upstream provider
|
v
Stream/return response to client
|
v
on_turn_complete (background thread)
|-- Response tagging (full turn: user + assistant)
|-- TurnTagIndex update
|-- Compaction check (soft/hard thresholds)
|-- Tag summary (re)build
|-- Fact extraction and supersessionComponent Map
Core (virtual_context/core/)
| Module | Responsibility |
|---|---|
engine.py | Top-level orchestrator. Owns the event lifecycle: on_message_inbound, on_turn_complete, ingest_history |
compactor.py | Two-tier compaction (summary and deep). Selects segments, calls the summarization LLM, writes compressed segments back to storage |
segmenter.py | Splits raw conversation turns into semantic segments with tag assignments |
tagging_pipeline.py | Two-tagger architecture: parallel inbound embedding tagger + LLM response tagger. Context bleed gate on topic shifts |
assembler.py | Budget-aware context block construction. Tag rules priority pass, then greedy fill pass |
monitor.py | Tracks context window fill level, triggers compaction when soft/hard thresholds are crossed |
tool_loop.py | Tool catalogue, execution dispatch for vc_* tools, SSE parsing, continuation rounds, anti-repetition |
tool_query.py | Handles individual vc_* tool calls, manages presented segment deduplication |
retrieval.py | 3-signal ranked retrieval with gravity/hub dampening |
fact_extractor.py | Structured fact extraction with supersession detection |
telemetry.py | TelemetryEvent, TelemetryRollup, TelemetryLedger for operational metrics |
temporal_resolver.py | Time-bounded recall: parses relative dates to absolute date ranges |
Proxy (virtual_context/proxy/)
| Module | Responsibility |
|---|---|
handlers.py | HTTP request handler. Format detection, request enrichment, streaming SSE forwarding, paging path with tool interception |
helpers.py | Pure functions: format detection, envelope stripping, context injection, payload construction |
provider_adapters.py | Adapter layer for Anthropic, OpenAI Chat, OpenAI Responses, and Gemini API formats |
metrics.py | Thread-safe event collector with ring buffer, snapshot aggregation, cursor-based SSE streaming |
server.py | ASGI application setup, route registration, dashboard serving |
dashboard_html.py | Self-contained single-page dashboard (all CSS/JS inlined) |
Infrastructure
| Module | Responsibility |
|---|---|
token_counter.py | Three-mode token counting: anthropic (exact), tiktoken (fast), estimate (len/4 fallback). Image-aware |
config.py | YAML config loading with validation, preset system, multi-instance support |
types.py | Dataclasses for the entire system: VCConfig, TagEntry, TurnTagEntry, Segment, Fact |
storage/sqlite.py | SQLite storage backend (default). Segments, facts, tag summaries, session state |
storage/postgres.py | PostgreSQL backend for multi-worker deployments |
storage/neo4j_store.py | Neo4j/FalkorDB backend for graph-based fact queries |
Storage Backends
All backends implement the same Store protocol:
- SQLite (default): Single-file, zero-config. Suitable for single-user and development.
- PostgreSQL: Multi-worker safe. Used when running multiple proxy instances against the same conversation store.
- Neo4j / FalkorDB: Graph-backed storage for fact relationships and traversal queries.
storage:
backend: "sqlite" # "sqlite", "postgres", or "neo4j"
sqlite:
path: ".virtualcontext/store.db"
postgres:
dsn: "postgresql://user:pass@host:5432/vc"
neo4j:
uri: "bolt://localhost:7687"Provider Adapters
The proxy supports four API formats with auto-detection:
| Format | Detection Signal | Context Injection Point |
|---|---|---|
| Anthropic | “system” field or model starts with “claude” | system field |
| OpenAI Chat | /v1/chat/completions path | messages[0] with role: “system” |
| OpenAI Responses | /v1/responses path | instructions field |
| Gemini | /v1beta/models path pattern | system_instruction field |
Detection is automatic. No configuration needed.
Session Management
Sessions track conversation continuity across requests. The proxy uses Redis-backed session state for multi-worker consistency (single-worker falls back to in-memory state).
Session identity is derived from: conversation ID embedded in HTML comments, API key + model combination as fallback, or explicit session headers when available.
Threading Model
- The main request path is synchronous within the async ASGI handler
on_turn_completeruns in aThreadPoolExecutor(max_workers=1)background thread- Each new request calls
wait_for_complete()to block until the previous turn finishes - Compaction and tag summary rebuilds happen in the background thread, never blocking the response path