The Virtual Context proxy sits between your application and any OpenAI-compatible LLM provider, transparently managing conversation memory, session continuity, and context assembly. Integration requires changing one base URL — no SDK rewrite, no new message format.
This page covers how the proxy handles conversation routing, session management via Redis, streaming passthrough with SSE event forwarding, and error-resilient failover. The proxy is the deployment layer that connects the Virtual Context engine to production environments, supporting Anthropic, OpenAI, Gemini, Groq, Mistral, Together, and any other provider that exposes a chat completions endpoint.
For teams evaluating Virtual Context, the proxy is the primary integration point. It preserves conversation history across sessions, handles the ingestion of existing conversation transcripts, and routes requests through the compaction and retrieval pipeline described in the architecture documentation. The proxy can run as a standalone process for development or behind a load balancer for multi-tenant production deployments with PostgreSQL and Redis backends.
Proxy Deep Dive
Everything the HTTP proxy does under the hood.
Conversation Continuity
The proxy injects an invisible <!-- vc:conversation=UUID --> marker into every assistant response. On subsequent requests, the proxy extracts the marker, routes to the correct conversation, and strips markers before forwarding upstream.
Stable identity derived from system prompt hash + early messages. Same client session always routes to the same conversation across restarts.
Redis Session Cache
Write-through Redis cache persists conversation history and engine state across container restarts. Full history restored on startup, eliminating cold-start re-ingestion. Falls back gracefully to store-only if Redis is unavailable.
Four-Format Support
Auto-detects Anthropic, OpenAI Chat, OpenAI Responses, and Gemini. Every pipeline stage is format-aware through PayloadFormat. One port handles all formats.
History Ingestion
First request: extract user+assistant pairs from existing history, tag each to bootstrap TurnTagIndex. No cold-start. Conversation-scoped, fully isolated.
Streaming
SSE forwarded byte-for-byte, zero added latency. Text accumulated in background for response tagging. Pipeline suppression: no compacted data means pure passthrough.
Error Resilience
Engine failure: request forwarded upstream unmodified. Bloat fallback: if VC enrichment exceeds original payload size, revert to pure passthrough. The proxy never blocks your LLM calls.
Envelope Stripping + Metadata Extraction
Strips client metadata while extracting sender identity and timestamps. Group chat participants appear as real names. Timestamps give segments accurate chronological ordering.
Multi-Instance Configuration
proxy:
instances:
- port: 5757
upstream: https://api.anthropic.com
label: anthropic
config: ./vc-anthropic.yaml # isolated engine + storage
- port: 5758
upstream: https://api.openai.com
label: openai # shares master engineLive Dashboard
Real-time monitoring at http://localhost:5757/dashboard: request grid with tags/tokens/latency, turn inspector, ingestion history, session stats, request capture (last 50 raw payloads), telemetry panel, SSE live updates, JSON export. Auth via X-VC-Dashboard-Token.
OpenClaw Plugin
Lifecycle hooks: sync retrieval (message.pre) and fire-and-forget compaction (agent.post). No bridge server needed.