Virtual Context can be configured through environment variables, configuration files, or programmatic options when using the Python API. This page documents all available configuration parameters for storage backends, embedding models, compaction models, retrieval tuning, and proxy behavior.
The most common configuration choices are the storage backend (SQLite for single-instance development, PostgreSQL for production multi-tenant setups), the embedding model used for semantic retrieval (defaults to text-embedding-3-small), and the compaction model that generates summaries and extracts structured facts during the hierarchical compression pipeline.
For a broader understanding of how these configuration options affect the system pipeline, see the architecture documentation. For benchmark results showing how different configurations affect recall quality and token efficiency, see the benchmark results.
Configuration
Full YAML config reference with defaults.
Minimal Config
Virtual-context is configured via a YAML file, typically virtual-context.yaml or ~/.virtualcontext/config.yaml.
version: "0.2"
context_window: 120000
tag_generator:
type: "llm"
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
summarization:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
storage:
backend: "sqlite"Tag Generator
tag_generator:
type: "llm" # "llm", "embedding", or "keyword"
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
max_tags: 10
min_tags: 5
embedding_model: "text-embedding-3-small"llm: Full LLM-based tagging. Both embedding and LLM taggers run in parallel (two-tagger architecture). Best quality, ~200ms latency.
embedding: Embedding-only. Faster, deterministic, limited vocabulary.
keyword: Regex-based keyword extraction. Fastest, lowest quality.
Compaction
compaction:
soft_threshold: 0.70 # start compaction at 70% fill
hard_threshold: 0.85 # force deep compaction at 85%
protected_recent_turns: 6 # recent turns exempt from compaction
min_summary_tokens: 100
max_summary_tokens: 500Thresholds are fractions of the context window. At 70% fill with a 120K window, compaction starts at ~84K tokens in use.
Summarization
summarization:
provider: "anthropic"
model: "claude-haiku-4-5-20251001"
temperature: 0.3The summarization LLM is separate from the upstream provider. Use a cheap, fast model for summarization even if your upstream is a frontier model.
Storage
storage:
backend: "sqlite" # "sqlite", "postgres", or "neo4j"
sqlite:
path: ".virtualcontext/store.db"
postgres:
dsn: "postgresql://user:pass@host:5432/vc"
neo4j:
uri: "bolt://localhost:7687"SQLite is default and requires no setup. PostgreSQL for multi-worker. Neo4j for graph-based fact traversal.
Retrieval
retrieval:
active_tag_lookback: 4
strategy_config:
default:
max_results: 10
max_budget_fraction: 0.25
include_related: true
broad:
max_results: 15
max_budget_fraction: 0.35
temporal:
max_results: 8
max_budget_fraction: 0.20active_tag_lookback: Tags from the last N turns are excluded from retrieval (their content is already in the raw history).
max_budget_fraction: The ceiling for injected context as a fraction of the total context window.
Assembly
assembly:
tag_context_max_tokens: 2000
recent_turns_kept: 4
context_hint_enabled: true
context_hint_max_tokens: 500context_hint_enabled: Injects a brief list of all available tags with segment counts, giving the model topic awareness.
Proxy
proxy:
host: "0.0.0.0"
port: 8100
upstream: "https://api.anthropic.com"
# Multi-instance mode
instances:
- port: 5757
upstream: "https://api.anthropic.com"
label: "anthropic"
config: "./vc-anthropic.yaml"
- port: 5758
upstream: "https://api.openai.com/v1"
label: "openai"
config: "./vc-openai.yaml"Each instance can have its own config file with isolated storage, tagger, and summarizer settings.
Environment Variables
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY | API key for Anthropic provider |
OPENAI_API_KEY | API key for OpenAI provider |
VIRTUAL_CONTEXT_CONFIG | Override config file path |
Use virtual-context config validate to check for missing fields or invalid settings.