The Virtual Context proxy sits between your application and any OpenAI-compatible LLM provider, transparently managing conversation memory, session continuity, and context assembly. Integration requires changing one base URL — no SDK rewrite, no new message format.

This page covers how the proxy handles conversation routing, session management via Redis, streaming passthrough with SSE event forwarding, and error-resilient failover. The proxy is the deployment layer that connects the Virtual Context engine to production environments, supporting Anthropic, OpenAI, Gemini, Groq, Mistral, Together, and any other provider that exposes a chat completions endpoint.

For teams evaluating Virtual Context, the proxy is the primary integration point. It preserves conversation history across sessions, handles the ingestion of existing conversation transcripts, and routes requests through the compaction and retrieval pipeline described in the architecture documentation. The proxy can run as a standalone process for development or behind a load balancer for multi-tenant production deployments with PostgreSQL and Redis backends.

Proxy Deep Dive

Everything the HTTP proxy does under the hood.

Conversation Continuity

The proxy injects an invisible  marker into every assistant response. On subsequent requests, the proxy extracts the marker, routes to the correct conversation, and strips markers before forwarding upstream.

Stable identity derived from system prompt hash + early messages. Same client session always routes to the same conversation across restarts.

Redis Session Cache

Write-through Redis cache persists conversation history and engine state across container restarts. Full history restored on startup, eliminating cold-start re-ingestion. Falls back gracefully to store-only if Redis is unavailable.

Four-Format Support

Auto-detects Anthropic, OpenAI Chat, OpenAI Responses, and Gemini. Every pipeline stage is format-aware through PayloadFormat. One port handles all formats.

History Ingestion

First request: extract user+assistant pairs from existing history, tag each to bootstrap TurnTagIndex. No cold-start. Conversation-scoped, fully isolated.

Streaming

SSE forwarded byte-for-byte, zero added latency. Text accumulated in background for response tagging. Pipeline suppression: no compacted data means pure passthrough.

Error Resilience

Engine failure: request forwarded upstream unmodified. Bloat fallback: if VC enrichment exceeds original payload size, revert to pure passthrough. The proxy never blocks your LLM calls.

Envelope Stripping + Metadata Extraction

Strips client metadata while extracting sender identity and timestamps. Group chat participants appear as real names. Timestamps give segments accurate chronological ordering.

Multi-Instance Configuration

proxy:
  instances:
    - port: 5757
      upstream: https://api.anthropic.com
      label: anthropic
      config: ./vc-anthropic.yaml    # isolated engine + storage
    - port: 5758
      upstream: https://api.openai.com
      label: openai                   # shares master engine

Live Dashboard

Real-time monitoring at http://localhost:5757/dashboard: request grid with tags/tokens/latency, turn inspector, ingestion history, session stats, request capture (last 50 raw payloads), telemetry panel, SSE live updates, JSON export. Auth via X-VC-Dashboard-Token.

OpenClaw Plugin

Lifecycle hooks: sync retrieval (message.pre) and fire-and-forget compaction (agent.post). No bridge server needed.