This is a local demo environment for Podmortem that uses Kind to create a test cluster with pod failure scenarios. Podmortem analyzes pod failures using pattern matching and AI to provide detailed explanations and remediation suggestions.
You need:
- Podman
- kubectl
- kind
- make
Installation:
macOS:
brew install podman kubectl kind make
Linux:
# Install podman, kubectl, kind, make using your package manager
Podmortem uses AI to analyze failure patterns and provide remediation suggestions. You need to configure an AI provider before running scenarios.
1. Ollama (Recommended for local testing)
- Runs locally on your machine
- Supports various open-source models (Mistral, Llama, etc.)
2. OpenAI-Compatible APIs
- Works with any OpenAI-compatible API (OpenAI, Anthropic, local models, etc.)
- Requires API key
Install and run Ollama:
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull and run a model (this may take a few minutes)
ollama pull mistral:7b
ollama serve
The demo is pre-configured to use Ollama at http://localhost:11434/
with the mistral:7b
model.
Works with various providers:
- OpenAI:
https://api.openai.com/v1
(models: gpt-3.5-turbo, gpt-4, etc.) - Anthropic:
https://api.anthropic.com/v1
(models: claude-3-sonnet, etc.) - Local OpenAI-compatible servers: Like text-generation-webui, LocalAI, vLLM
- Other cloud providers: Many offer OpenAI-compatible endpoints
-
Get an API key from your chosen provider
-
Create a Kubernetes secret for your API key:
kubectl create secret generic openai-secret \
--from-literal=api-key=YOUR_API_KEY \
-n podmortem-system
- Apply the OpenAI-compatible configuration:
# Update the secret with your API key
kubectl patch secret openai-secret -n podmortem-system \
--type='json' -p='[{"op": "replace", "path": "/data/api-key", "value": "'$(echo -n YOUR_API_KEY | base64)'"}]'
# Switch to OpenAI-compatible provider
make setup-openai
- Update the API URL and model for your provider:
# Use base URLs only - /chat/completions is added automatically
kubectl patch aiprovider openai-ai-provider -n podmortem-system --type='json' \
-p='[{"op": "replace", "path": "/spec/apiUrl", "value": "YOUR_PROVIDER_BASE_URL"}]'
kubectl patch aiprovider openai-ai-provider -n podmortem-system --type='json' \
-p='[{"op": "replace", "path": "/spec/modelId", "value": "YOUR_MODEL_NAME"}]'
Patterns are the foundation of Podmortem's intelligent log analysis. They define what to look for in pod failure logs and how to interpret those findings.
Patterns are YAML-defined rules that:
- Match specific log entries using regex patterns
- Assign confidence scores to potential root causes
- Provide context extraction around matched lines
- Offer remediation guidance with actionable fix suggestions
- Link to documentation for deeper understanding
When a pod fails, Podmortem:
- Scans logs against all loaded patterns
- Scores matches based on pattern confidence and context
- Ranks findings to identify the most likely root cause
- Extracts context around critical log lines
- Provides AI analysis using pattern insights and remediation guidance
- id: "quarkus_database_connection_failure"
name: "Database Connection Failure"
primary_pattern:
regex: "Unable to connect to database|Connection refused.*database"
confidence: 0.95 # How confident this pattern indicates the root cause (0.0-1.0)
secondary_patterns:
- regex: "SQLException|Connection timeout"
weight: 0.4 # How much this pattern contributes to the overall score (0.0-1.0)
proximity_window: 20 # Look for this pattern within 20 lines of the primary match
severity: "CRITICAL" # CRITICAL, HIGH, MEDIUM, LOW - impacts prioritization
category: ["database", "connectivity"]
remediation:
description: "Application cannot connect to the database"
common_causes:
- "Database server is down"
- "Incorrect connection credentials"
- "Network connectivity issues"
suggested_commands:
- "kubectl get secrets -n your-namespace"
- "kubectl logs database-pod -n database-namespace"
- "ping database-hostname"
documentation_links:
- "https://quarkus.io/guides/datasource"
You can create custom pattern libraries for your specific applications:
apiVersion: podmortem.redhat.com/v1alpha1
kind: PatternLibrary
metadata:
name: my-custom-patterns
namespace: podmortem-system
spec:
repositories:
- name: "my-patterns-repo"
url: "https://github.com/your-org/my-app-patterns.git"
branch: "main"
enabledLibraries:
- "my-app-core-patterns"
- "my-app-integration-patterns"
# First create a secret for Git credentials
apiVersion: v1
kind: Secret
metadata:
name: git-credentials
namespace: podmortem-system
type: Opaque
stringData:
username: "your-username"
password: "your-token"
---
apiVersion: podmortem.redhat.com/v1alpha1
kind: PatternLibrary
metadata:
name: private-patterns
namespace: podmortem-system
spec:
repositories:
- name: "private-patterns-repo"
url: "https://github.com/your-org/private-patterns.git"
branch: "main"
credentials:
secretRef: "git-credentials"
Podmortem loads all .yml
and .yaml
files from your repository, so you can organize them however works best for your use:
my-app-patterns/
├── database-patterns.yml # Database-related failures
├── network-patterns.yml # Network and connectivity issues
├── startup-patterns.yml # Application startup problems
└── my-app-patternlibrary.yaml # Configuration example
Note: The demo automatically loads Quarkus patterns from the patterns-quarkus
repository. You can see the pattern library configuration in config/pattern-library.yaml
.
Setup:
cd podmortem/local-demo
make setup
Run a test scenario (automatically waits for analysis and resets for next test):
make run-quarkus-failure
The test will:
- Deploy the failing pod
- Show the pod failure logs
- Wait for Podmortem to analyze the failure
- Show the AI analysis results
- Clean up and reset for the next test
You can also watch manually:
make watch-analysis
View results:
make get-analysis
make logs
Cleanup:
make clean-pods
# or
make destroy
Each scenario automatically waits for analysis and resets when complete:
make run-quarkus-failure
- Database connection and CDI issuesmake run-microservices-failure
- Circuit breaker and messaging failuresmake run-infrastructure-failure
- OOM kills and image pull issuesmake run-performance-failure
- Memory exhaustion and GC issues
Note: Each test scenario will take 30-60 seconds to complete as it waits for the pod to fail and analysis to finish.
Watch for analysis:
make watch-analysis
Get analysis details:
make get-analysis
kubectl get podmortem -n demo -o yaml
Check operator logs:
make logs
View pod events:
make events
# or directly
kubectl get events -A --sort-by=.lastTimestamp
kubectl describe pod <pod> -n <ns>
Check cluster status:
make status
Setup:
make setup
- Setup everythingmake cluster
- Create cluster onlymake setup-openai
- Switch to OpenAI-compatible providermake destroy
- Remove everything
Pattern Management:
kubectl get patternlibrary -n podmortem-system
- View loaded pattern librarieskubectl describe patternlibrary quarkus-patterns -n podmortem-system
- Check pattern sync status
Run scenarios (automated - waits for analysis and resets):
make run-quarkus-failure
make run-microservices-failure
make run-infrastructure-failure
make run-performance-failure
Manual monitoring (if needed):
make watch-analysis
make get-analysis
make logs
make events
make status
Cleanup:
make clean-pods
make reset-monitor
- Clear analysis history by resetting the monitormake clean-analysis
- Show how to manually clear analysis results