100M Context Window. Virtualized.
virtual-context compresses, organizes, and retrieves — so your model reasons sharper, costs less, and never forgets.
virtual-context lives within the conversation
Other memory systems are external to the conversation. They store facts in a database and retrieve them at query time, hoping the right thing surfaces. virtual-context is different. It sits inside your conversation flow, sees every turn, manages the context window in real-time, and gives the model tools to navigate its own memory. Nothing changes in your code except a base URL.
Swap your base URL. virtual-context handles the rest.
Every turn is tagged, compressed, and indexed. The model gets retrieval tools to navigate its own memory within a managed token budget.
expand_topic, collapse_topic, find_quote, query_facts, remember_when, recall_all. The model navigates its own memory: drilling into topics, searching for specific quotes, querying structured facts. Up to 10 tool rounds run transparently within a single user-visible response.
One line. Any provider.
# Add this alias to your shell profile:
alias claude-vc='ANTHROPIC_BASE_URL="https://anthropic.virtual-context.com/?vckey=vc-YOUR_KEY" claude'
# Then launch Claude Code with virtual context:
claude-vc
One alias. Infinite memory for every Claude Code session.
// ~/.openclaw/openclaw.json
// In models.providers, change the baseUrl for your provider:
"anthropic-apikey": {
"baseUrl": "https://anthropic.virtual-context.com?vckey=vc-YOUR_KEY",
"api": "anthropic-messages",
"apiKey": "sk-ant-...", // your normal Anthropic key
"models": [...] // keep your existing models
}
// For OpenAI models, use path-based vckey with /v1 at the end:
"openai": {
"baseUrl": "https://openai.virtual-context.com/vc-YOUR_KEY/v1",
"api": "openai-responses",
"apiKey": "sk-...",
"models": [...]
}
// OpenClaw appends /chat/completions or /responses depending on the api setting
Works with Anthropic, OpenAI, and all supported providers.
pip install virtual-context[all]
virtual-context onboard --wizard
virtual-context proxy \
--upstream https://api.anthropic.com
Local Ollama for tagging. SQLite storage. Zero external dependencies. AGPL-3.0.
curl https://anthropic.virtual-context.com/v1/messages?vckey=vc-YOUR_KEY \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
"messages":[{"role":"user","content":"Hello"}]}'
Raw HTTP. Works with any language or tool.
HTTP Bridge
Sits between any LLM client and upstream provider. Auto-detects Anthropic, OpenAI, Gemini formats. Zero client changes.
MCP Server
9 tools for Claude Desktop, Cursor, or any MCP client. recall_context, find_quote, query_facts, and more.
Python SDK
Two calls: on_message_inbound() before the LLM, on_turn_complete() after. Plus ingest_document().
Cloud
Managed infrastructure at *.virtual-context.com. Seven provider subdomains. Same API as self-hosted.
How it compares
| KB Retrieval | RAG | Context Reduction | Virtual Context | |
|---|---|---|---|---|
| What is stored | Isolated facts (“likes pizza”) | Document chunks | Compressed history blob | Layered: summaries + original text + structured facts (nothing is discarded) |
| Context management | None (active session grows unchecked) | Append chunks, never free space | Compress to fit, can’t undo | Automatic compression keeps context trim and relevant. Model can also expand or collapse topics on demand |
| Recall precision | Re-search vector DB, hope for a match | Depends on chunk boundaries | Lost after summarization | Relevant context surfaces automatically by topic. Full-text search, structured fact lookup, and time-scoped recall available when needed |
| What the model knows about its memory | Nothing (retrieval is external) | Nothing (retrieval is external) | Knows it was summarized, can’t act on it | Sees all available topics, token costs, and depth levels. Can navigate its own memory |
| Cost at scale | Grows with corpus size | Grows with corpus size | Grows with conversation (1M+ tokens) | Configurable ceiling (50K stays flat) |
| Tool-heavy agents | No handling (tool outputs fill context unchecked) | N/A | No handling | Tool outputs automatically intercepted, truncated, and indexed. Full content searchable on demand |
| Best fit | Simple preference lookup | Doc retrieval | Long-chat cost reduction | All of the above, with coherent reasoning at turn 500 |
Answer quality doesn’t degrade
Compression concentrates attention on signal: less noise, better reasoning. The model recalls decisions from turn 12 at turn 480 because the context window is managed, not accumulated.
50K managed window vs 1M raw context
Run a 1M-token model at a 50K managed ceiling. Compression fires early and often, keeping only curated context. You pay for 50K tokens per request, not 1M.
Tool results don’t blow up your context
Tool outputs fill context fast. A single code search can return thousands of tokens. VC intercepts tool outputs, truncates what’s shown, indexes the full content for on-demand search. Coding, legal doc review, data analysis: anything with interleaved tool chains.
Supported providers
Structured context beats raw context at every tier.
LongMemEval (100 random questions)
vs 33% full-context baseline using the same mid-tier model. ICLR 2025 dataset.
| Category | VC | Baseline | Delta |
|---|---|---|---|
| Knowledge-update | 100% | 29.4% | +70.6pp |
| Multi-session | 88.5% | 15.4% | +73.1pp |
| Temporal-reasoning | 92.9% | 32.1% | +60.8pp |
| Single-session (user) | 100% | 46.2% | +53.8pp |
| Single-session (assistant) | 100% | 72.7% | +27.3pp |
| Single-session (preference) | 100% | 20.0% | +80.0pp |
Token reduction
52K managed window vs 118K raw context. $0.16/question vs $0.36. Same accuracy, less than half the cost.
Answered in 1-2 tool calls
8 emergent retrieval patterns. The reader model learns to navigate memory efficiently without hand-crafted retrieval strategies.
Common questions about context management and memory for LLMs
Virtual Context is built for teams that need persistent memory, lower token overhead, and better long-session recall without forcing the model to reread the whole transcript every turn.
What makes Virtual Context a context management system instead of just a larger context window?
Virtual Context does not try to push an ever-larger raw transcript into the model. It manages conversation state outside the active prompt, compacts older material by topic, and retrieves the most relevant memory when the model needs it. That makes it a context management system, not just a bigger prompt budget.
How does memory stay available without replaying the full conversation every turn?
The system segments conversation history, maintains summaries and structured facts, and uses retrieval to reassemble only the parts that matter for the next response. Older material stays recoverable, but it does not have to be resent on every request. This keeps memory available while controlling cost and prompt size.
Does Virtual Context work with existing LLM providers and SDKs?
Yes. The core product is designed to sit in front of OpenAI-compatible APIs as a proxy, so the usual integration is a base URL change rather than an SDK rewrite. It is built to work with Anthropic, OpenAI, Gemini, Groq, Mistral, Together, and similar providers.
Can I self-host Virtual Context, or is it only a hosted product?
You can do either. The core engine is open source under AGPL-3.0 for self-hosted deployments, and the managed product adds hosted infrastructure, tenant provisioning, billing, and dashboard tooling around the same engine.
When should I use Virtual Context instead of relying on plain long-context prompting?
It is most useful when conversations run long, tools produce large outputs, or accuracy depends on recalling decisions made far earlier in the session. In those cases, sending the full raw history tends to get expensive and noisy. Virtual Context is built to preserve recall while keeping the active prompt curated.
Start free. Ship memory in minutes.
Cloud or self-hosted. Same engine, same API, full control of your provider keys.