100M Context Window. Virtualized.
virtual-context compresses, organizes, and retrieves — so your model reasons sharper, costs less, and never forgets.
virtual-context lives within the conversation
Other memory systems are external to the conversation. They store facts in a database and retrieve them at query time, hoping the right thing surfaces. virtual-context is different. It sits inside your conversation flow, sees every turn, manages the context window in real-time, and gives the model tools to navigate its own memory. Nothing changes in your code except a base URL.
Swap your base URL. virtual-context handles the rest.
Every turn is tagged, compressed, and indexed. The model gets retrieval tools to navigate its own memory within a managed token budget.
expand_topic, collapse_topic, find_quote, query_facts, remember_when, recall_all. The model navigates its own memory: drilling into topics, searching for specific quotes, querying structured facts. Up to 10 tool rounds run transparently within a single user-visible response.
One line. Any provider.
# Add this alias to your shell profile:
alias claude-vc='ANTHROPIC_BASE_URL="https://anthropic.virtual-context.com/?vckey=vc-YOUR_KEY" claude'
# Then launch Claude Code with virtual context:
claude-vc
One alias. Infinite memory for every Claude Code session.
// ~/.openclaw/openclaw.json
// In models.providers, change the baseUrl for your provider:
"anthropic-apikey": {
"baseUrl": "https://anthropic.virtual-context.com?vckey=vc-YOUR_KEY",
"api": "anthropic-messages",
"apiKey": "sk-ant-...", // your normal Anthropic key
"models": [...] // keep your existing models
}
// For OpenAI models, use path-based vckey with /v1 at the end:
"openai": {
"baseUrl": "https://openai.virtual-context.com/vc-YOUR_KEY/v1",
"api": "openai-responses",
"apiKey": "sk-...",
"models": [...]
}
// OpenClaw appends /chat/completions or /responses depending on the api setting
Works with Anthropic, OpenAI, and all supported providers.
pip install virtual-context[all]
virtual-context onboard --wizard
virtual-context proxy \
--upstream https://api.anthropic.com
Local Ollama for tagging. SQLite storage. Zero external dependencies. AGPL-3.0.
curl https://anthropic.virtual-context.com/v1/messages?vckey=vc-YOUR_KEY \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
"messages":[{"role":"user","content":"Hello"}]}'
Raw HTTP. Works with any language or tool.
HTTP Bridge
Sits between any LLM client and upstream provider. Auto-detects Anthropic, OpenAI, Gemini formats. Zero client changes.
MCP Server
9 tools for Claude Desktop, Cursor, or any MCP client. recall_context, find_quote, query_facts, and more.
Python SDK
Two calls: on_message_inbound() before the LLM, on_turn_complete() after. Plus ingest_document().
Cloud
Managed infrastructure at *.virtual-context.com. Seven provider subdomains. Same API as self-hosted.
How it compares
| KB Retrieval | RAG | Context Reduction | Virtual Context | |
|---|---|---|---|---|
| What is stored | Isolated facts (“likes pizza”) | Document chunks | Compressed history blob | Layered: summaries + original text + structured facts (nothing is discarded) |
| Context management | None (active session grows unchecked) | Append chunks, never free space | Compress to fit, can’t undo | Automatic compression keeps context trim and relevant. Model can also expand or collapse topics on demand |
| Recall precision | Re-search vector DB, hope for a match | Depends on chunk boundaries | Lost after summarization | Relevant context surfaces automatically by topic. Full-text search, structured fact lookup, and time-scoped recall available when needed |
| What the model knows about its memory | Nothing (retrieval is external) | Nothing (retrieval is external) | Knows it was summarized, can’t act on it | Sees all available topics, token costs, and depth levels. Can navigate its own memory |
| Cost at scale | Grows with corpus size | Grows with corpus size | Grows with conversation (1M+ tokens) | Configurable ceiling (50K stays flat) |
| Tool-heavy agents | No handling (tool outputs fill context unchecked) | N/A | No handling | Tool outputs automatically intercepted, truncated, and indexed. Full content searchable on demand |
| Best fit | Simple preference lookup | Doc retrieval | Long-chat cost reduction | All of the above, with coherent reasoning at turn 500 |
Answer quality doesn’t degrade
Compression concentrates attention on signal: less noise, better reasoning. The model recalls decisions from turn 12 at turn 480 because the context window is managed, not accumulated.
50K managed window vs 1M raw context
Run a 1M-token model at a 50K managed ceiling. Compression fires early and often, keeping only curated context. You pay for 50K tokens per request, not 1M.
Tool results don’t blow up your context
Tool outputs fill context fast. A single code search can return thousands of tokens. VC intercepts tool outputs, truncates what’s shown, indexes the full content for on-demand search. Coding, legal doc review, data analysis: anything with interleaved tool chains.
Supported providers
Structured context beats raw context at every tier.
LongMemEval (100 random questions)
vs 33% full-context baseline using the same mid-tier model. ICLR 2025 dataset.
| Category | VC | Baseline | Delta |
|---|---|---|---|
| Knowledge-update | 100% | 29.4% | +70.6pp |
| Multi-session | 88.5% | 15.4% | +73.1pp |
| Temporal-reasoning | 92.9% | 32.1% | +60.8pp |
| Single-session (user) | 100% | 46.2% | +53.8pp |
| Single-session (assistant) | 100% | 72.7% | +27.3pp |
| Single-session (preference) | 100% | 20.0% | +80.0pp |
Token reduction
52K managed window vs 118K raw context. $0.16/question vs $0.36. Same accuracy, less than half the cost.
Answered in 1-2 tool calls
8 emergent retrieval patterns. The reader model learns to navigate memory efficiently without hand-crafted retrieval strategies.
Start free. Ship memory in minutes.
Cloud or self-hosted. Same engine, same API, full control of your provider keys.