π Modern toolkit for creating Model Context Protocol (MCP) documentation servers with intelligent content detection, advanced search optimization, and beautiful CLI tools.
This repository provides a complete system for building MCP documentation servers that can intelligently crawl, index, and search documentation websites with both keyword and semantic search capabilities.
@agentdesk/mcp-docs
- Core documentation indexing and search functionalitycreate-mcp-docs
- CLI tool for generating MCP documentation servers
npx create-mcp-docs my-docs-server
This interactive CLI will:
- β¨ Guide you through project setup (name and description)
- π Collect documentation URLs to crawl
- βοΈ Let you choose between FlexSearch (keyword) or Vectra (semantic) search
- π Generate a complete MCP server project
- β Provide ready-to-use TypeScript code
packages/my-docs-server/
βββ package.json # Dependencies and scripts
βββ src/
β βββ server.ts # MCP server implementation
β βββ build-index.ts # Documentation indexer
βββ .env # Environment configuration
βββ README.md # Usage instructions
βββ ...
cd packages/my-docs-server
pnpm install
pnpm build:index # Build documentation search index
pnpm start # Start MCP server
Choose the right search provider for your needs:
Best for: Smaller documentation sets, fast setup, exact term matching
Pros:
- Lightning-fast search performance
- No API keys required
- Smaller index size
- Great for technical documentation with specific terms
Cons:
- Limited semantic understanding
- May miss conceptually related content
Best for: Large documentation sets, conceptual queries, content discovery
Pros:
- Understands meaning and context
- Finds conceptually related content
- Better for natural language queries
- Advanced "Late Chunking" for context preservation
Cons:
- Requires OpenAI API key
- Larger index size
- Slightly slower initial indexing
graph TB
subgraph "CLI Layer"
CLI["create-mcp-docs CLI"]
CLI --> Setup["Project Setup"]
CLI --> URLs["URL Collection"]
CLI --> Provider["Provider Selection"]
CLI --> Gen["Project Generation"]
end
subgraph "Generated MCP Server"
Server["MCP Server"]
Index["Index Builder"]
Config[".env Configuration"]
Server --> Tool["search_docs tool"]
end
subgraph "Core Package (@agentdesk/mcp-docs)"
CreateIndex["createIndex()"]
KB["KnowledgeBase"]
Heuristics["Content Detection"]
Pipeline["Document Pipeline"]
Optimizer["Search Optimizer"]
end
subgraph "Search Providers"
FlexSearch["FlexSearch<br/>(Keyword)"]
Vectra["Vectra<br/>(Semantic)"]
end
subgraph "Document Processing"
Crawler["Playwright Crawler"]
Parser["Content Parser"]
Chunker["Chunking Service"]
ReadabilityJS["Mozilla Readability"]
end
subgraph "AI Integration"
AI["AI Model"]
MCP["MCP Protocol"]
OpenAI["OpenAI Embeddings"]
end
%% CLI Flow
Gen --> Server
Gen --> Index
Gen --> Config
%% Core Integration
Index --> CreateIndex
Tool --> KB
CreateIndex --> Heuristics
CreateIndex --> Pipeline
%% Processing Pipeline
Pipeline --> Crawler
Pipeline --> Parser
Pipeline --> Chunker
Parser --> ReadabilityJS
%% Provider Selection
CreateIndex --> FlexSearch
CreateIndex --> Vectra
Vectra --> OpenAI
KB --> FlexSearch
KB --> Vectra
KB --> Optimizer
%% AI Integration
AI --> MCP
MCP --> Server
Tool --> AI
%% Styling
classDef cli fill:#e1f5fe
classDef core fill:#f3e5f5
classDef provider fill:#e8f5e8
classDef processing fill:#fff3e0
classDef ai fill:#fce4ec
class CLI,Setup,URLs,Provider,Gen cli
class CreateIndex,KB,Heuristics,Pipeline,Optimizer core
class FlexSearch,Vectra provider
class Crawler,Parser,Chunker,ReadabilityJS processing
class AI,MCP,OpenAI ai
sequenceDiagram
participant User
participant CLI as create-mcp-docs CLI
participant Generator as Project Generator
participant MCP as Generated MCP Server
participant Indexer as Documentation Indexer
participant Provider as Search Provider
participant AI as AI Model
User->>CLI: npx create-mcp-docs
CLI->>User: Collect project details & URLs
CLI->>Generator: Generate project files
Generator->>MCP: Create MCP server & indexer
User->>Indexer: pnpm build:index
Indexer->>Provider: Extract & index documents
Provider->>Indexer: Search index ready
User->>MCP: pnpm start
AI->>MCP: Search documentation
MCP->>Provider: Execute search query
Provider->>MCP: Optimized results
MCP->>AI: Contextual documentation
- Automatically detects optimal CSS selectors using heuristics
- Integrates Mozilla Readability for content extraction
- Provides confidence scoring and fallback options
- Validates selectors against real page content
Interactive React-based CLI with:
- Project Setup: Name and description input
- URL Collection: Add multiple documentation sources
- Provider Selection: Choose between FlexSearch and Vectra
- Live Generation: Real-time project creation feedback
- Success Guide: Clear next steps after creation
Advanced search optimization that goes beyond simple keyword matching:
- Full Document Strategy: Returns entire documents when multiple chunks are highly relevant
- Expanded Chunk Strategy: Intelligently expands related content sections
- Token Budget Management: Optimizes results to fit within AI model context limits
- Coherence Preservation: Maintains document structure and context flow
- Intelligent Crawling: Playwright-powered browser automation
- Content Cleaning: Mozilla Readability integration for clean extraction
- Flexible Chunking: Traditional, semantic, and Late Chunking strategies
- Concurrent Processing: Configurable concurrency with rate limiting
- Follows established MCP server patterns
- Built with TypeScript for full type safety
- Comprehensive error handling and logging
- Environment-based configuration
- Ready for deployment with zero additional setup
# Create a server for your product docs
npx create-mcp-docs product-docs
# URLs: https://docs.yourproduct.com
# Choose FlexSearch for fast, precise searches
# Create a semantic search server for comprehensive docs
npx create-mcp-docs comprehensive-docs
# URLs: Multiple documentation sources
# Choose Vectra for conceptual understanding
# Create a server for API reference
npx create-mcp-docs api-docs
# URLs: https://api.yourservice.com/docs
# FlexSearch excels at exact API method/parameter searches
For Vectra users, our "Late Chunking" implementation preserves contextual information across chunk boundaries:
- Contextual Embeddings: Documents are processed through full context before chunking
- Semantic Boundaries: Intelligent splitting that respects document structure
- Context Preservation: Related information stays connected across chunks
- Optimized for Documentation: Tuned specifically for technical documentation patterns
Learn more in the @agentdesk/mcp-docs
documentation
Our search optimizer analyzes raw search results and intelligently decides the best strategy:
// Example optimization strategies
{
fullDocumentThreshold: 3, // 3+ chunks = return full document
expandedChunkMultiplier: 2, // Expand single chunks by 2x
targetUtilization: 0.9, // Use 90% of token budget
}
Detailed algorithm explanations in the core package documentation
import { createIndex } from "@agentdesk/mcp-docs";
await createIndex({
pages: [
{
url: "https://docs.example.com",
mode: "crawl",
selectors: {
links: 'a[href^="/docs"]',
content: "article.prose",
},
},
],
// Choose your provider
provider: {
type: "vectra",
embeddings: {
provider: "openai",
model: "text-embedding-ada-002",
apiKey: process.env.OPENAI_API_KEY,
},
chunking: {
strategy: "late-chunking",
useCase: "documentation",
},
},
outputFile: "docs-vectra-index",
});
import { KnowledgeBase, getModuleDir } from "@agentdesk/mcp-docs";
const docs = new KnowledgeBase({
path: getModuleDir(import.meta.url), // Directory containing index
apiKey: process.env.OPENAI_API_KEY, // For Vectra indices
});
const results = await docs.search({
query: "How do I authenticate users?",
tokenLimit: 10000,
});
- @agentdesk/mcp-docs - Detailed API reference and algorithms
- create-mcp-docs - CLI tool implementation details
git clone https://github.com/agentdesk/create-mcp-docs
cd create-mcp-docs
pnpm install
pnpm build
# Core package
cd packages/mcp-docs
pnpm dev
# CLI package
cd packages/create-mcp-docs
pnpm build
pnpm link --global
create-mcp-docs test-project
# Run all tests
pnpm test
# Package-specific tests
cd packages/mcp-docs && pnpm test
cd packages/create-mcp-docs && pnpm test
- Node.js >= 16.0.0
- pnpm >= 8.0.0 (recommended)
- OpenAI API Key (for Vectra semantic search only)
We welcome contributions! Please see:
- Issues - Bug reports and feature requests
- Pull Requests - Code contributions
- Documentation - Improvements and examples
- Use TypeScript for all new code
- Follow existing code style and patterns
- Add comprehensive tests for new features
- Update documentation for API changes
MIT - See LICENSE file for details.
- Model Context Protocol - The standard this implements
- AgentKit - AI agent development framework
- AgentDesk - AI agent platform
Built with β€οΈ by the AgentDesk team