Architecture

How virtual-context works under the hood.

Pipeline

Every request flows through a fixed sequence:

Client request
    |
    v
Format detection (Anthropic / OpenAI Chat / OpenAI Responses / Gemini)
    |
    v
Envelope stripping (OpenClaw channel metadata removed)
    |
    v
Wait for previous on_turn_complete
    |
    v
History ingestion (first request only: bootstrap TurnTagIndex)
    |
    v
on_message_inbound
    |-- Inbound tagging (embedding tagger + LLM tagger in parallel)
    |-- Retrieval (3-signal RRF: IDF tag overlap + BM25 keyword + embedding cosine)
    |-- Assembly (budget-aware context block construction)
    |
    v
Inject <virtual-context> block into system prompt
    |
    v
Forward to upstream provider
    |
    v
Stream/return response to client
    |
    v
on_turn_complete (background thread)
    |-- Response tagging (full turn: user + assistant)
    |-- TurnTagIndex update
    |-- Compaction check (soft/hard thresholds)
    |-- Tag summary (re)build
    |-- Fact extraction and supersession

Component Map

Core (virtual_context/core/)

ModuleResponsibility
engine.pyTop-level orchestrator. Owns the event lifecycle: on_message_inbound, on_turn_complete, ingest_history
compactor.pyTwo-tier compaction (summary and deep). Selects segments, calls the summarization LLM, writes compressed segments back to storage
segmenter.pySplits raw conversation turns into semantic segments with tag assignments
tagging_pipeline.pyTwo-tagger architecture: parallel inbound embedding tagger + LLM response tagger. Context bleed gate on topic shifts
assembler.pyBudget-aware context block construction. Tag rules priority pass, then greedy fill pass
monitor.pyTracks context window fill level, triggers compaction when soft/hard thresholds are crossed
tool_loop.pyTool catalogue, execution dispatch for vc_* tools, SSE parsing, continuation rounds, anti-repetition
tool_query.pyHandles individual vc_* tool calls, manages presented segment deduplication
retrieval.py3-signal ranked retrieval with gravity/hub dampening
fact_extractor.pyStructured fact extraction with supersession detection
telemetry.pyTelemetryEvent, TelemetryRollup, TelemetryLedger for operational metrics
temporal_resolver.pyTime-bounded recall: parses relative dates to absolute date ranges

Proxy (virtual_context/proxy/)

ModuleResponsibility
handlers.pyHTTP request handler. Format detection, request enrichment, streaming SSE forwarding, paging path with tool interception
helpers.pyPure functions: format detection, envelope stripping, context injection, payload construction
provider_adapters.pyAdapter layer for Anthropic, OpenAI Chat, OpenAI Responses, and Gemini API formats
metrics.pyThread-safe event collector with ring buffer, snapshot aggregation, cursor-based SSE streaming
server.pyASGI application setup, route registration, dashboard serving
dashboard_html.pySelf-contained single-page dashboard (all CSS/JS inlined)

Infrastructure

ModuleResponsibility
token_counter.pyThree-mode token counting: anthropic (exact), tiktoken (fast), estimate (len/4 fallback). Image-aware
config.pyYAML config loading with validation, preset system, multi-instance support
types.pyDataclasses for the entire system: VCConfig, TagEntry, TurnTagEntry, Segment, Fact
storage/sqlite.pySQLite storage backend (default). Segments, facts, tag summaries, session state
storage/postgres.pyPostgreSQL backend for multi-worker deployments
storage/neo4j_store.pyNeo4j/FalkorDB backend for graph-based fact queries

Storage Backends

All backends implement the same Store protocol:

  • SQLite (default): Single-file, zero-config. Suitable for single-user and development.
  • PostgreSQL: Multi-worker safe. Used when running multiple proxy instances against the same conversation store.
  • Neo4j / FalkorDB: Graph-backed storage for fact relationships and traversal queries.
storage:
  backend: "sqlite"   # "sqlite", "postgres", or "neo4j"
  sqlite:
    path: ".virtualcontext/store.db"
  postgres:
    dsn: "postgresql://user:pass@host:5432/vc"
  neo4j:
    uri: "bolt://localhost:7687"

Provider Adapters

The proxy supports four API formats with auto-detection:

FormatDetection SignalContext Injection Point
Anthropic“system” field or model starts with “claude”system field
OpenAI Chat/v1/chat/completions pathmessages[0] with role: “system”
OpenAI Responses/v1/responses pathinstructions field
Gemini/v1beta/models path patternsystem_instruction field

Detection is automatic. No configuration needed.

Session Management

Sessions track conversation continuity across requests. The proxy uses Redis-backed session state for multi-worker consistency (single-worker falls back to in-memory state).

Session identity is derived from: conversation ID embedded in HTML comments, API key + model combination as fallback, or explicit session headers when available.

Threading Model

  • The main request path is synchronous within the async ASGI handler
  • on_turn_complete runs in a ThreadPoolExecutor(max_workers=1) background thread
  • Each new request calls wait_for_complete() to block until the previous turn finishes
  • Compaction and tag summary rebuilds happen in the background thread, never blocking the response path