Multi-armed mocks for LLM apps
Mocktopus is a drop-in replacement for OpenAI/Anthropic APIs, designed to make your LLM application tests fast, deterministic, and cost-free.
Testing LLM applications is challenging:
- Non-deterministic: Same prompt, different responses
- Expensive: Every test run costs API credits
- Slow: API calls add latency to test suites
- Network-dependent: Can't run tests offline
- Complex workflows: Tool calls and streaming complicate testing
Mocktopus solves these problems by providing a local mock server that perfectly mimics LLM APIs.
Just change your base URL - no code changes required
Same input always produces the same output - perfect for CI/CD
- Tool/function calling - Full support for complex workflows
- Streaming responses - Server-sent events (SSE) support
- Multiple providers - OpenAI and Anthropic compatible
- Zero cost - No API charges for tests
- Fast execution - No network latency
- Offline testing - Run tests without internet connection
pip install mocktopus
version: 1
rules:
- type: llm.openai
when:
model: "gpt-4*"
messages_contains: "hello"
respond:
content: "Hello! How can I help you today?"
mocktopus serve -s scenario.yaml
from openai import OpenAI
# Instead of the real API:
# client = OpenAI(api_key="sk-...")
# Use Mocktopus:
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="mock-key" # Any string works
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "hello"}]
)
print(response.choices[0].message.content)
# Output: "Hello! How can I help you today?"
Use predefined YAML scenarios for deterministic responses:
mocktopus serve -s examples/chat-basic.yaml
Proxy and record real API calls for later replay:
mocktopus serve --mode record --recordings-dir ./recordings
Replay previously recorded API interactions:
mocktopus serve --mode replay --recordings-dir ./recordings
version: 1
rules:
- type: llm.openai
when:
messages_contains: "weather"
respond:
content: "It's sunny today!"
version: 1
rules:
- type: llm.openai
when:
messages_contains: "weather"
respond:
tool_calls:
- id: "call_123"
type: "function"
function:
name: "get_weather"
arguments: '{"location": "San Francisco"}'
version: 1
rules:
- type: llm.openai
when:
model: "*"
respond:
content: "This will be streamed..."
delay_ms: 50 # Delay between chunks
chunk_size: 5 # Characters per chunk
version: 1
rules:
- type: llm.openai
when:
endpoint: "/v1/embeddings"
respond:
embeddings:
- embedding: [0.1, 0.2, -0.3, 0.4] # Mock embedding vectors
index: 0
usage:
input_tokens: 5
total_tokens: 5
version: 1
rules:
- type: llm.openai
when:
messages_contains: "test"
times: 3 # Only responds 3 times
respond:
content: "Limited response"
# Initialize a new project with templates
mocktopus init # Basic template
mocktopus init --template rag # RAG/embeddings testing
mocktopus init --template agents # Multi-step agent workflows
mocktopus init --template multimodal # Image/audio/vision APIs
mocktopus init --template enterprise # Advanced error handling
# Basic usage
mocktopus serve -s scenario.yaml
# Custom port
mocktopus serve -s scenario.yaml -p 9000
# Verbose logging
mocktopus serve -s scenario.yaml -v
# Validate scenario files with schema checking
mocktopus validate scenario.yaml
# Explain rule matching for debugging
mocktopus explain -s scenario.yaml --prompt "Hello world"
mocktopus explain -s scenario.yaml --model gpt-4 --prompt "help me" -v
# Diagnose configuration issues
mocktopus doctor # General environment check
mocktopus doctor -s scenario.yaml # Diagnose specific scenario
mocktopus doctor --fix # Auto-fix common issues
# Simulate requests without starting server
mocktopus simulate -s scenario.yaml --prompt "Hello"
# Generate example scenarios
mocktopus example --type basic > my-scenario.yaml
mocktopus example --type tools > tools-scenario.yaml
import pytest
from mocktopus import use_mocktopus
def test_my_llm_app(use_mocktopus):
# Load scenario
use_mocktopus.load_yaml("tests/scenarios/test.yaml")
# Get a client
client = use_mocktopus.openai_client()
# Test your app
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "test"}]
)
assert "expected" in response.choices[0].message.content
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
- run: pip install -e .
- run: mocktopus serve -s tests/scenarios.yaml &
- run: pytest # Your tests hit localhost:8080
Mocktopus supports multiple matching strategies:
- Exact match:
messages_contains: "exact phrase"
- Regex:
messages_regex: "\\d+ items?"
- Glob:
model: "gpt-4*"
respond:
content: "Response text"
delay_ms: 100 # Simulate latency
usage:
input_tokens: 10
output_tokens: 20
# For streaming
chunk_size: 10 # Characters per chunk
- OpenAI chat completions API
- Streaming support (SSE)
- Function/tool calling
- Anthropic messages API
- Embeddings API
- Comprehensive CLI tools
- JSON schema validation
- Recording & replay
- Assistants API
- Image generation
- Semantic similarity matching
- Response templating
- Load testing mode
We welcome contributions! See our Contributing Guide for details.
MIT - See LICENSE for details.
Made with π by EvalOps