Documentation

OS-level memory management for LLM context.

Quick Start (Cloud)

Add your VC token as a query parameter to the base URL. Your provider API key stays in the header as usual.

Anthropic SDK

import anthropic
client = anthropic.Anthropic(
  base_url="https://anthropic.virtual-context.com?vckey=vc-YOUR_KEY"
  # api_key uses your normal Anthropic key
)

OpenAI SDK

from openai import OpenAI
client = OpenAI(
  base_url="https://openai.virtual-context.com/v1?vckey=vc-YOUR_KEY",
  # api_key uses your normal OpenAI key
)

curl

curl https://anthropic.virtual-context.com/v1/messages?vckey=vc-YOUR_KEY \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,"messages":[{"role":"user","content":"Hello"}]}'

The ?vckey= parameter identifies your VC tenant. Your target adjusted context window is sent to the LLM provider.

Self-Hosted Install

# Full install with all extras
pip install "virtual-context[all]"

# Or minimal + specific extras
pip install virtual-context                  # core only (pyyaml + httpx)
pip install "virtual-context[bridge]"        # HTTP proxy
pip install "virtual-context[mcp]"           # MCP server
pip install "virtual-context[embeddings]"    # sentence-transformers
pip install "virtual-context[tui]"           # interactive chat
pip install "virtual-context[ingest]"        # document ingestion (PDF, DOCX, XLSX)

Python 3.11+ required. Runs on macOS, Linux, WSL. Setup wizard:

virtual-context onboard --wizard

HTTP Proxy

The proxy sits between any LLM client and the upstream provider. It auto-detects Anthropic, OpenAI, and Gemini request formats.

# Start proxy (defaults to localhost:8021)
virtual-context proxy --upstream https://api.anthropic.com

# Multi-instance: separate ports per provider
virtual-context proxy \
  --upstream https://api.anthropic.com \
  --port 8021 \
  --config ./anthropic.yaml &
virtual-context proxy \
  --upstream https://api.openai.com \
  --port 8022 \
  --config ./openai.yaml &

Session continuity: sessions resume across proxy restarts via SQLite persistence. Tool interception: large tool results (git diff, test output) are auto-truncated to head+tail and indexed into FTS5 for full-text search.

MCP Server

Works with Claude Desktop, Cursor, and any MCP-compatible client.

# Add to claude_desktop_config.json
{
  "mcpServers": {
    "virtual-context": {
      "command": "virtual-context",
      "args": ["mcp"]
    }
  }
}

Tools: recall_context, recall_all, find_quote, query_facts, expand_topic, collapse_topic

Resources: virtual-context://status, virtual-context://domains

Prompts: summarize_session, recall_topic

Python SDK

from virtual_context import VirtualContextEngine

engine = VirtualContextEngine(config_path="./virtual-context.yaml")

# Before sending to LLM: enrich with memory
assembled = engine.on_message_inbound(user_message, history)

# After LLM response: tag, segment, compact
report = engine.on_turn_complete(full_history)

# Ingest external documents
engine.ingest_document("meeting-notes.pdf")

CLI Reference

CommandDescription
virtual-context proxyStart the HTTP proxy (main integration point)
virtual-context mcpStart the MCP server
virtual-context tuiInteractive terminal chat
virtual-context statusShow session state, tags, storage stats
virtual-context recallRetrieve context for a query
virtual-context compactForce compaction of current session
virtual-context domainsList all tags/domains with metadata
virtual-context retrieveTag + fetch summaries for a query
virtual-context transformTag + fetch + assemble (full pipeline)
virtual-context aliasesShow detected tag aliases
virtual-context ingestIngest a document (PDF, DOCX, XLSX, TXT)
virtual-context onboardInteractive setup wizard

Configuration

Config is loaded from virtual-context.yaml in the working directory, or passed via --config flag.

# virtual-context.yaml
storage:
  backend: sqlite
  path: ./vc-store.db

compaction:
  watermark: 0.7          # compact when 70% of budget used
  protected_turns: 6      # never compact the last N turns
  budget_tokens: 128000

tagging:
  provider: ollama         # or openai, anthropic
  model: nomic-embed-text  # embedding model for inbound tagger
  max_tags: 5

retrieval:
  tools: true              # expose vc_* tools to the LLM
  max_context_tokens: 32000

Provider Routing (Cloud)

Use subdomains to route to different LLM providers. Append ?vckey=vc-YOUR_KEY to any URL below.

ProviderBase URL
Anthropichttps://anthropic.virtual-context.com
OpenAIhttps://openai.virtual-context.com/v1
Geminihttps://gemini.virtual-context.com
Groqhttps://groq.virtual-context.com
Mistralhttps://mistral.virtual-context.com
Togetherhttps://together.virtual-context.com
OpenRouterhttps://openrouter.virtual-context.com

The ?vckey= parameter identifies your VC tenant. Your target adjusted context window is sent to the LLM provider.