OS-style memory for LLMs

100M Context Window. Virtualized.

virtual-context compresses, organizes, and retrieves — so your model reasons sharper, costs less, and never forgets.

0%
context sent
vs 0% baseline
95%
LongMemEval accuracy
2.2x
fewer tokens per request
100%
knowledge-update recall
1 line
of code to integrate

virtual-context lives within the conversation

Other memory systems are external to the conversation. They store facts in a database and retrieve them at query time, hoping the right thing surfaces. virtual-context is different. It sits inside your conversation flow, sees every turn, manages the context window in real-time, and gives the model tools to navigate its own memory. Nothing changes in your code except a base URL.

Swap your base URL. virtual-context handles the rest.

Every turn is tagged, compressed, and indexed. The model gets retrieval tools to navigate its own memory within a managed token budget.

Turn 1
Normal conversation
virtual-context forwards transparently. Zero overhead, zero changes. The model doesn’t know it’s there.
client virtual-context LLM
Turn 10
Topics emerge
Every turn is automatically tagged by topic. A vocabulary builds through conversation with no predefined categories and no configuration. The model still doesn’t know.
auth-middleware react-migration deploy-pipeline
Turn 60
The model gets tools
expand_topic, collapse_topic, find_quote, query_facts, remember_when, recall_all. The model navigates its own memory: drilling into topics, searching for specific quotes, querying structured facts. Up to 10 tool rounds run transparently within a single user-visible response.
expand_topic collapse_topic find_quote query_facts
Turn 50
Context window fills. Compression fires.
Stale turns become layered summaries: raw turns, segment summaries, tag summaries. Facts are extracted and indexed with full conversational context. The model’s context gets smaller and denser. A 50K ceiling means compression fires early and often, keeping the window curated.
1M token model 50K managed window = 95% less per call
Turn 100
Structured facts are verified
Per-turn fact signals from earlier turns are consolidated against full multi-turn segments at compaction. Two chances to get it right, each with progressively more context. Facts carry provenance: subject, verb, what, temporal status, source turns.
subject–verb–object status tracking queryable
Turn 500
Full coherence
The model recalls a decision from turn 12 by expanding that topic. A managed 50K window outperforms a raw 1M window because every token carries signal. Cross-vocabulary retrieval bridges “caching trick” to “materialized view” across 400 turns of drift. The conversation has been compressed dozens of times. Nothing was lost; it was reorganized.
turn 12 decision recalled at turn 480 · 95% cheaper

One line. Any provider.

# Add this alias to your shell profile:
alias claude-vc='ANTHROPIC_BASE_URL="https://anthropic.virtual-context.com/?vckey=vc-YOUR_KEY" claude'

# Then launch Claude Code with virtual context:
claude-vc

One alias. Infinite memory for every Claude Code session.

// ~/.openclaw/openclaw.json
// In models.providers, change the baseUrl for your provider:

"anthropic-apikey": {
  "baseUrl": "https://anthropic.virtual-context.com?vckey=vc-YOUR_KEY",
  "api": "anthropic-messages",
  "apiKey": "sk-ant-...",   // your normal Anthropic key
  "models": [...]              // keep your existing models
}

// For OpenAI models, use path-based vckey with /v1 at the end:
"openai": {
  "baseUrl": "https://openai.virtual-context.com/vc-YOUR_KEY/v1",
  "api": "openai-responses",
  "apiKey": "sk-...",
  "models": [...]
}
// OpenClaw appends /chat/completions or /responses depending on the api setting

Works with Anthropic, OpenAI, and all supported providers.

pip install virtual-context[all]
virtual-context onboard --wizard
virtual-context proxy \
  --upstream https://api.anthropic.com

Local Ollama for tagging. SQLite storage. Zero external dependencies. AGPL-3.0.

curl https://anthropic.virtual-context.com/v1/messages?vckey=vc-YOUR_KEY \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":1024,
     "messages":[{"role":"user","content":"Hello"}]}'

Raw HTTP. Works with any language or tool.

HTTP Bridge

Sits between any LLM client and upstream provider. Auto-detects Anthropic, OpenAI, Gemini formats. Zero client changes.

MCP Server

9 tools for Claude Desktop, Cursor, or any MCP client. recall_context, find_quote, query_facts, and more.

Python SDK

Two calls: on_message_inbound() before the LLM, on_turn_complete() after. Plus ingest_document().

Cloud

Managed infrastructure at *.virtual-context.com. Seven provider subdomains. Same API as self-hosted.

How it compares

KB Retrieval RAG Context Reduction Virtual Context
What is stored Isolated facts (“likes pizza”) Document chunks Compressed history blob Layered: summaries + original text + structured facts (nothing is discarded)
Context management None (active session grows unchecked) Append chunks, never free space Compress to fit, can’t undo Automatic compression keeps context trim and relevant. Model can also expand or collapse topics on demand
Recall precision Re-search vector DB, hope for a match Depends on chunk boundaries Lost after summarization Relevant context surfaces automatically by topic. Full-text search, structured fact lookup, and time-scoped recall available when needed
What the model knows about its memory Nothing (retrieval is external) Nothing (retrieval is external) Knows it was summarized, can’t act on it Sees all available topics, token costs, and depth levels. Can navigate its own memory
Cost at scale Grows with corpus size Grows with corpus size Grows with conversation (1M+ tokens) Configurable ceiling (50K stays flat)
Tool-heavy agents No handling (tool outputs fill context unchecked) N/A No handling Tool outputs automatically intercepted, truncated, and indexed. Full content searchable on demand
Best fit Simple preference lookup Doc retrieval Long-chat cost reduction All of the above, with coherent reasoning at turn 500
Turn 500 = Turn 5

Answer quality doesn’t degrade

Compression concentrates attention on signal: less noise, better reasoning. The model recalls decisions from turn 12 at turn 480 because the context window is managed, not accumulated.

~95% fewer tokens

50K managed window vs 1M raw context

Run a 1M-token model at a 50K managed ceiling. Compression fires early and often, keeping only curated context. You pay for 50K tokens per request, not 1M.

Built for tool-heavy agents

Tool results don’t blow up your context

Tool outputs fill context fast. A single code search can return thousands of tokens. VC intercepts tool outputs, truncates what’s shown, indexes the full content for on-demand search. Coding, legal doc review, data analysis: anything with interleaved tool chains.

Open source core. Self-host everything.

AGPL-3.0 licensed. Run locally with Ollama for tagging, so there are zero API costs for the memory layer. SQLite storage, no external services, no background workers. The cloud product uses the exact same engine.

AGPL-3.0license
2 depspyyaml + httpx
Python 3.11+no binary deps

Supported providers

Anthropicanthropic.virtual-context.com
OpenAIopenai.virtual-context.com
Geminigemini.virtual-context.com
Groqgroq.virtual-context.com
Mistralmistral.virtual-context.com
Togethertogether.virtual-context.com

Structured context beats raw context at every tier.

0%

LongMemEval (100 random questions)

vs 33% full-context baseline using the same mid-tier model. ICLR 2025 dataset.

CategoryVCBaselineDelta
Knowledge-update100%29.4%+70.6pp
Multi-session88.5%15.4%+73.1pp
Temporal-reasoning92.9%32.1%+60.8pp
Single-session (user)100%46.2%+53.8pp
Single-session (assistant)100%72.7%+27.3pp
Single-session (preference)100%20.0%+80.0pp
0x

Token reduction

52K managed window vs 118K raw context. $0.16/question vs $0.36. Same accuracy, less than half the cost.

0%

Answered in 1-2 tool calls

8 emergent retrieval patterns. The reader model learns to navigate memory efficiently without hand-crafted retrieval strategies.

Start free. Ship memory in minutes.

Cloud or self-hosted. Same engine, same API, full control of your provider keys.