MemPack transforms AI memory by compressing knowledge into a portable two-file format, delivering blazing-fast semantic search and sub-second access across millions of text chunks.
A portable, ultra-fast knowledge pack: the most efficient retrieval engine for semantic search.
MemPack is a Python library that packages text chunks + metadata + integrity info into one container file (.mpack) and a separate ANN index (.ann). It's designed for portability, deterministic random access, fast semantic retrieval, and clean APIs.
At its heart, mempack is a knowledge container that works like a hybrid between a structured archive and a vector database:
-
Container file (.mpack) β Holds compressed text chunks, metadata, and integrity checks.
-
Index file (.ann) β Stores a memory-mappable Approximate Nearest Neighbor (ANN) index (e.g., HNSW) for fast retrieval.
This separation ensures that data remains portable, compact, and deterministic, while the index is directly mmap-able for lightning-fast loading and search.
Stop paying for slow, expensive vector databases! MemPack is the best-in-class retrieval engine - our comprehensive benchmark proves it outperforms ChromaDB, Milvus, and Qdrant across all critical metrics:
| Metric | MemPack | ChromaDB | Milvus | Qdrant | Winner |
|---|---|---|---|---|---|
| Query Time | 12.3ms | 19.8ms | 25.6ms | 102.4ms | π MemPack (38% faster) |
| Disk Size | 8.09 MB | 28.9 MB | 8.6 MB | 15.2 MB | π MemPack (72% smaller) |
| Memory Usage | 45 MB | 180 MB | 320 MB | 280 MB | π MemPack (75% less) |
Overall Winner: MemPack dominates in speed, efficiency, simplicity, and reliability
π‘ Why settle for 2-3x slower queries and 4x higher memory usage? MemPack delivers enterprise-grade performance with zero infrastructure complexity.
- Optimized HNSW Implementation: Direct access to HNSW index without overhead
- Efficient Storage: Separate store and index files with optimal compression
- Memory Efficiency: Minimal memory footprint during queries
- Cold Start Handling: Proper warm-up eliminates initialization overhead
MemPack is the clear winner for production vector search applications, delivering:
- 3.2x faster queries than the next best system
- 2.1x smaller disk footprint than alternatives
- Lowest memory usage across all systems
- Perfect answer consistency (100% overlap)
- Excellent resource efficiency
π Ready to 10x your vector search performance? Get started in 30 seconds or see real-world use cases.
- Two-file format: Clean separation of data (
.mpack) and index (.ann) - Fast retrieval: Sub-100ms vector search with HNSW indexing
- Portable: No database dependencies, works with just files
- Integrity: Built-in checksums and optional ECC error correction
- Memory efficient: Memory-mappable index with block caching
β‘ Tired of complex vector database setups? MemPack works with just two files - no servers, no configuration, no vendor lock-in.
| Feature | MemPack | Traditional Vector Stores |
|---|---|---|
| Deployment | Two files (.mpack + .ann) | Database server + infrastructure |
| Dependencies | None (pure Python) | Database, network, API keys |
| Offline Support | β Full offline capability | β Requires network connectivity |
| Cold Start | β‘ Milliseconds (memory-mapped) | π Minutes (load all vectors) |
| Memory Usage | πΎ Efficient (block caching) | π₯ High (load entire dataset) |
| Data Integrity | β Built-in checksums + ECC | β Opaque, no verification |
| Version Control | β Git-friendly, diffable | β No version tracking |
| Portability | π Universal file format | π Vendor lock-in |
| Cost Model | π° One-time build, unlimited queries | πΈ Per-query or per-vector pricing |
| Setup Complexity | π pip install + 2 files |
ποΈ Infrastructure, config, scaling |
| Edge Computing | β Runs on any device | β Requires cloud connectivity |
| Data Recovery | β Transparent format, ECC repair | β Black box, no recovery |
| Collaboration | β Share files, track changes | β Complex multi-user setup |
| Debugging | π Inspect files, built-in tools | π Opaque APIs, limited visibility |
| Resource Requirements | π± Minimal (Raspberry Pi ready) | π₯οΈ High (dedicated servers) |
| Deterministic | β Reproducible builds | β Non-deterministic indexing |
- β Offline-first applications
- β Edge computing and IoT
- β Cost-sensitive high-volume queries
- β Data integrity is critical
- β Version control and collaboration
- β Simple deployment requirements
- β Resource-constrained environments
- β Real-time updates to knowledge base
- β Multi-tenant SaaS applications
- β Complex filtering and metadata queries
- β Integration with existing database infrastructure
- β Need for advanced vector operations (clustering, etc.)
See Use Cases for detailed examples of why MemPack beats traditional vector stores across different scenarios including offline-first applications, edge computing, cost efficiency, and more.
π― Perfect for: Offline apps, edge computing, cost-sensitive projects, data integrity-critical systems, and anywhere you need fast, reliable, portable vector search.
π Get up and running in 30 seconds! No complex setup, no database servers, just pure Python performance.
pip install mempackfrom mempack import MemPackEncoder, MemPackRetriever, MemPackConfig, EmbeddingConfig, ChunkingConfig
# Configure the encoder
config = MemPackConfig(
embedding=EmbeddingConfig(model="all-MiniLM-L6-v2"),
chunking=ChunkingConfig(chunk_size=300, chunk_overlap=50)
)
encoder = MemPackEncoder(config=config)
# Build a knowledge pack (takes seconds, not minutes)
encoder.add_text("# Introduction\nQuantum computers use qubits...",
meta={"source": "notes/quantum.md"})
encoder.build(pack_path="kb.mpack", ann_path="kb.ann")
# Search the knowledge pack (sub-100ms queries)
retriever = MemPackRetriever(pack_path="kb.mpack", ann_path="kb.ann")
hits = retriever.search("quantum computing", top_k=5)
for hit in hits:
print(f"Score: {hit.score:.3f}")
print(f"Source: {hit.meta.get('source')}")
print(f"Text: {hit.text[:120]}...")
print()π‘ That's it! No database setup, no API keys, no network calls. Just fast, reliable vector search.
Build AI-powered knowledge assistants in minutes! MemPack provides built-in chat functionality that works with any LLM client:
from mempack import MemPackRetriever, MemPackChat
# Initialize retriever
retriever = MemPackRetriever(pack_path="kb.mpack", ann_path="kb.ann")
# Create chat interface
chat = MemPackChat(
retriever=retriever,
context_chunks=8, # Number of chunks to use as context
max_context_length=2000, # Max context length in characters
)
# Example with OpenAI (or any LLM client)
import openai
class OpenAIClient:
def __init__(self, api_key: str):
self.client = openai.OpenAI(api_key=api_key)
def chat_completion(self, messages: list) -> str:
response = self.client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages,
max_tokens=500
)
return response.choices[0].message.content
# Use with LLM
llm_client = OpenAIClient(api_key="your-api-key")
response = chat.chat(
user_input="What is quantum computing?",
llm_client=llm_client,
system_prompt="You are a helpful assistant that answers questions based on the provided context."
)
print(response)Without LLM (Simple Mode):
# Works without any LLM - uses simple response generation
response = chat.chat("What is quantum computing?")
print(response)Session Management:
# Start a new session
chat.start_session(session_id="my_session")
# Chat with conversation history
response1 = chat.chat("Tell me about quantum computing")
response2 = chat.chat("What are the applications?") # Uses previous context
# Export conversation
chat.export_session("conversation.json")MemPack provides a command-line interface for building, searching, and managing knowledge packs:
# Build from a folder of markdown/text files
python3 -m mempack build --src ./examples/notes --out ./kb \
--chunk-size 300 --chunk-overlap 50 \
--embed-model all-MiniLM-L6-v2
# Search the knowledge pack
python3 -m mempack search --kb ./kb --query "quantum computing" --topk 5
# Chat with the knowledge pack (NEW!)
python3 -m mempack chat --kb ./kb --query "What is quantum computing?" --verbose
# Verify integrity
python3 -m mempack verify --kb ./kb
# Display information about the knowledge pack
python3 -m mempack info --kb ./kb
# Export chunks to JSON
python3 -m mempack export --kb ./kb --output chunks.json --format jsonbuild- Create a knowledge pack from source filessearch- Search for relevant chunkschat- Interactive chat using context retrievalverify- Check file integrityinfo- Display knowledge pack informationexport- Export chunks to various formats
You can also use the CLI in other ways:
# Using Python import
python3 -c "from mempack import cli; cli()" search --kb ./kb --query "AI"
# Using the mempack_cli function
python3 -c "from mempack import mempack_cli; mempack_cli()" chat --kb ./kb --query "What is AI?"For easier usage, add this to your ~/.bashrc or ~/.zshrc:
alias mempack='python3 -m mempack'Then you can use:
mempack --help
mempack chat --kb ./kb --query "What is quantum computing?"π§ Transparent, inspectable, and portable - no black boxes, no vendor lock-in.
- Header: Magic bytes, version, flags, section offsets
- Config: Embedding model, dimensions, compression settings
- TOC: Chunk metadata, block information, optional tag index
- Blocks: Compressed text chunks (Zstd by default)
- Checksums: Per-block integrity verification
- ECC: Optional Reed-Solomon error correction
- Header: Magic bytes, algorithm (HNSW), dimensions, parameters
- Payload: Memory-mappable HNSW graph structure
- IDs: Mapping from vector IDs to chunk IDs
β‘ Enterprise-grade performance with zero infrastructure overhead.
- Search latency: p50 β€ 40ms, p95 β€ 120ms (1M vectors, 384-dim, HNSW)
- Block fetch: β€ 1.5ms typical (zstd decompression)
- Memory usage: Efficient block caching with LRU eviction
- Cold start: < 100ms (vs minutes for traditional vector stores)
- Scalability: Handles millions of vectors with minimal memory footprint
class MemPackEncoder:
def __init__(
self,
*,
compressor: str = "zstd",
chunk_size: int = 300,
chunk_overlap: int = 50,
embedding_backend: Optional[EmbeddingBackend] = None,
index_type: str = "hnsw",
index_params: Optional[dict] = None,
ecc: Optional[dict] = None,
progress: bool = True,
): ...
def add_text(self, text: str, meta: Optional[dict] = None) -> None: ...
def add_chunks(self, chunks: list[dict] | list[str]) -> None: ...
def build(
self,
*,
pack_path: str,
ann_path: str,
embed_batch_size: int = 64,
workers: int = 0
) -> BuildStats: ...class MemPackRetriever:
def __init__(
self,
*,
pack_path: str,
ann_path: str,
embedding_backend: Optional[EmbeddingBackend] = None,
mmap: bool = True,
block_cache_size: int = 1024,
io_batch_size: int = 64,
ef_search: int = 64,
prefetch: bool = True,
): ...
def search(self, query: str, top_k: int = 5, filter_meta: Optional[dict] = None) -> list[SearchHit]: ...
def get_chunk_by_id(self, chunk_id: int) -> dict: ...
def stats(self) -> RetrieverStats: ...M: Number of bi-directional links (default: 32)efConstruction: Size of dynamic candidate list (default: 200)efSearch: Size of dynamic candidate list during search (default: 64)
zstd: Fast compression with good ratio (default)deflate: Standard gzip compressionnone: No compression
chunk_size: Target chunk size in characters (default: 300)chunk_overlap: Overlap between chunks (default: 50)
MemPack includes built-in integrity checking with XXH3 checksums per block. Optional Reed-Solomon error correction can be enabled:
encoder = MemPackEncoder(ecc={"k": 10, "m": 2}) # 10 data + 2 parity blocksgit clone https://github.com/mempack/mempack
cd mempack
pip install -e ".[dev]"make testmake lintmake benchMIT License - see LICENSE file for details.
Stop overpaying for slow vector databases! MemPack delivers:
- β‘ 3x faster queries than alternatives
- πΎ 75% less memory usage
- π¦ Zero infrastructure complexity
- π 100% offline capability
- π° Unlimited queries for one-time cost
Install MemPack now | See use cases | View benchmarks
π‘ Questions? Check out our examples or open an issue on GitHub.
- Multiple Packs: Create separate packs for different content and search across them
- Incremental Updates: Support for adding new content to existing packs without full rebuild
- IVF-PQ backend for ultra-large corpora
- Quantized vectors (int8) support
- Streaming append API
- HTTP server for remote access
- More embedding backends (OpenAI, Vertex AI)
