Skip to content

Commit c278f18

Browse files
committed
Merge branch 'master' of github.com:robusta-dev/holmesgpt into buildcheck
2 parents 8867692 + f8cd1c2 commit c278f18

File tree

177 files changed

+11413
-1497
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

177 files changed

+11413
-1497
lines changed

Dockerfile

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ ENV VIRTUAL_ENV=/app/venv
2323
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
2424

2525
# Needed for kubectl
26+
ENV VERIFY_CHECKSUM=true \
27+
VERIFY_SIGNATURES=true
2628
RUN curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key -o Release.key
2729

2830
# Set the architecture-specific kube lineage URLs
@@ -58,12 +60,7 @@ RUN chmod 777 argocd
5860
RUN ./argocd --help
5961

6062
# Install Helm
61-
RUN curl https://baltocdn.com/helm/signing.asc | gpg --dearmor -o /usr/share/keyrings/helm.gpg \
62-
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" \
63-
| tee /etc/apt/sources.list.d/helm-stable-debian.list \
64-
&& apt-get update \
65-
&& apt-get install -y helm \
66-
&& rm -rf /var/lib/apt/lists/*
63+
RUN curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
6764

6865
# Set up poetry
6966
ARG PRIVATE_PACKAGE_REGISTRY="none"
@@ -135,7 +132,7 @@ COPY --from=builder /app/argocd /usr/local/bin/argocd
135132
RUN argocd --help
136133

137134
# Set up Helm
138-
COPY --from=builder /usr/bin/helm /usr/local/bin/helm
135+
COPY --from=builder /usr/local/bin/helm /usr/local/bin/helm
139136
RUN chmod 555 /usr/local/bin/helm
140137
RUN helm version
141138

docs/ai-providers/anthropic.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,27 @@ You can also pass the API key directly as a command-line parameter:
2121
holmes ask "what pods are failing?" --model="anthropic/<your-claude-model>" --api-key="your-api-key"
2222
```
2323

24+
## Prompt Caching
25+
26+
HolmesGPT adds Anthropic's prompt caching feature, which can significantly reduce costs and latency for repeated API calls with similar prompts.
27+
28+
HolmesGPT automatically adds cache control to the last message in each API call. This caches everything from the beginning of the conversation up to that point, making subsequent calls with the same prefix much faster and cheaper.
29+
30+
### How It Works
31+
32+
- Anthropic uses prefix-based caching - it caches the exact sequence of messages up to the cache control point
33+
- The cache has a 5-minute lifetime by default
34+
- Cached content must be at least 1024 tokens to be effective
35+
- You're charged for cache writes on the first call, but subsequent cache hits are much cheaper
36+
37+
### Benefits in HolmesGPT
38+
39+
Prompt caching is particularly effective for HolmesGPT because:
40+
41+
- System prompts with tool definitions are large and static - perfect for caching
42+
- Tool investigation loops reuse the same context multiple times
43+
- Multi-step investigations benefit from cached conversation history
44+
2445
## Additional Resources
2546

2647
HolmesGPT uses the LiteLLM API to support Anthropic provider. Refer to [LiteLLM Anthropic docs](https://litellm.vercel.app/docs/providers/anthropic){:target="_blank"} for more details.

docs/assets/Holmes-azure-mcp.gif

15.4 MB
Loading

docs/community.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,23 @@
11
# Community
22

3-
Join us for our regular community meetings to discuss the HolmesGPT roadmap and collaborate on the future of AI-powered troubleshooting.
3+
Join our community to collaborate on the future of AI-powered troubleshooting.
44

5-
## Community Meeting
5+
## Community Meetup Recording
66

7-
📅 **HolmesGPT Community Meetup**
7+
📹 **Watch our first HolmesGPT Community Meetup**
88

9-
**🗓️ Date:** Thursday, August 21, 2025
9+
We held our inaugural community meetup on August 21, 2025. Watch the recording to learn about:
1010

11-
**📍 Where:** [Google Meet](https://meet.google.com/jxc-ujyf-xwy)
11+
- HolmesGPT roadmap and upcoming features
12+
- Community Q&A and feedback
13+
- Ways to get involved with the project
1214

13-
| Local Date & Time | Time Zone |
14-
|------------------|-----------|
15-
| Thursday, Aug 21 · 8:00 - 9:00 AM | PT (Pacific Time) |
16-
| Thursday, Aug 21 · 11:00 AM - 12:00 PM | ET (Eastern Time) |
17-
| Thursday, Aug 21 · 8:30 - 9:30 PM | IST (India Standard Time) |
15+
**[▶️ Watch Recording on YouTube](https://youtu.be/slQRc6nlFQU)**
1816

19-
### Agenda
20-
- [📋 HolmesGPT Roadmap](https://github.com/orgs/robusta-dev/projects/2) - Review and discuss upcoming features
21-
- Community feedback and Q&A
22-
- Ways to get involved
17+
### Resources
2318

24-
**Links:**
25-
26-
- [🔗 Google Meet](https://meet.google.com/jxc-ujyf-xwy)
2719
- [📝 Meeting Notes](https://docs.google.com/document/d/1sIHCcTivyzrF5XNvos7ZT_UcxEOqgwfawsTbb9wMJe4/edit?tab=t.0)
28-
- [📋 Roadmap](https://github.com/orgs/robusta-dev/projects/2)
20+
- [📋 HolmesGPT Roadmap](https://github.com/orgs/robusta-dev/projects/2)
2921

3022
## Get Involved
3123

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Bash Toolset
2+
3+
The bash toolset provides secure execution of common command-line tools used for troubleshooting and system analysis. It replaces multiple YAML-based toolsets with a single, comprehensive toolset that includes safety validation and command parsing.
4+
5+
**⚠️ Security Note**: This toolset executes commands on the system where Holmes is running. Only validated, safe commands are allowed, and the toolset is disabled by default for security reasons.
6+
7+
## Supported Commands
8+
9+
The bash toolset supports the following categories of commands:
10+
11+
### Cloud Providers
12+
13+
**AWS CLI (`aws`)**
14+
15+
- Supports various AWS services and operations
16+
- Commands are validated for safety before execution
17+
18+
**Azure CLI (`az`)**
19+
20+
- Supports Azure operations including AKS management
21+
- Network and account operations
22+
23+
### Kubernetes Tools
24+
25+
**kubectl**
26+
27+
- Standard Kubernetes operations: get, describe, logs, events
28+
- Resource management and cluster inspection
29+
- Live metrics via `kubectl top`
30+
31+
**Helm**
32+
33+
- Helm chart operations
34+
- Repository management
35+
- Release inspection
36+
37+
**ArgoCD**
38+
39+
- Application management
40+
- Deployment status checking
41+
42+
### Container Tools
43+
44+
**Docker**
45+
46+
- Container inspection and management
47+
- Image operations
48+
- Basic Docker commands
49+
50+
### Text Processing Utilities
51+
52+
**Data Processing**
53+
54+
- `grep` - Text searching and pattern matching
55+
- `jq` - JSON processing and querying
56+
- `sed` - Stream editing and text transformation
57+
- `awk` - Pattern scanning and text processing
58+
59+
**File Utilities**
60+
61+
- `cut` - Column extraction
62+
- `sort` - Data sorting
63+
- `uniq` - Duplicate removal
64+
- `head` - Show first lines
65+
- `tail` - Show last lines
66+
- `wc` - Word, line, and character counting
67+
68+
**Text Transformation**
69+
70+
- `tr` - Character translation and deletion
71+
- `base64` - Base64 encoding/decoding
72+
73+
### Special Tools
74+
75+
**kubectl_run_image**
76+
77+
Creates temporary debug pods in Kubernetes clusters for diagnostic commands:
78+
79+
- Runs commands in specified container images
80+
- Automatically cleans up temporary pods
81+
- Supports custom namespaces and timeouts
82+
- Useful for network debugging, DNS resolution, and environment inspection
83+
84+
## Command Validation
85+
86+
All commands undergo security validation before execution:
87+
88+
- Only whitelisted commands and options are allowed
89+
- Dangerous operations are blocked (file writes, system calls, etc.)
90+
- Commands are parsed and validated for safety
91+
- Pipe operations between supported commands are allowed

docs/data-sources/builtin-toolsets/coralogix-logs.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,84 @@ toolsets:
2929
enabled: false # Disable default Kubernetes logging
3030
```
3131

32+
## Custom Labels Configuration (Optional)
33+
34+
By default, the Coralogix toolset expects logs to use standard Kubernetes field names. If your Coralogix deployment uses different field names for Kubernetes metadata, you can customize the label mappings.
35+
36+
This is useful when:
37+
38+
- Your log ingestion pipeline uses custom field names
39+
- You have a non-standard Coralogix setup with different metadata fields
40+
- Your Kubernetes logs are structured differently in Coralogix
41+
42+
To find the correct field names, examine your logs in the Coralogix UI and identify how pod names, namespaces, log messages, and timestamps are labeled.
43+
44+
### Example with Custom Labels
45+
46+
```yaml-toolset-config
47+
toolsets:
48+
coralogix/logs:
49+
enabled: true
50+
config:
51+
api_key: "<your Coralogix API key>"
52+
domain: "eu2.coralogix.com"
53+
team_hostname: "your-company-name"
54+
labels:
55+
namespace: "resource.attributes.k8s.pod.name" # Default
56+
pod: "resource.attributes.k8s.namespace.name" # Default
57+
log_message: "logRecord.body" # Default
58+
timestamp: "logRecord.attributes.time" # Default
59+
60+
kubernetes/logs:
61+
enabled: false # Disable default Kubernetes logging
62+
```
63+
64+
**Label Configuration Fields:**
65+
66+
- `namespace`: Field path for Kubernetes namespace name
67+
- `pod`: Field path for Kubernetes pod name
68+
- `log_message`: Field path for the actual log message content
69+
- `timestamp`: Field path for log timestamp
70+
71+
All label fields are optional and will use the defaults shown above if not specified.
72+
73+
## Logs Retrieval Strategy (Optional)
74+
75+
Coralogix stores logs in two tiers with different performance characteristics:
76+
77+
- **Frequent Search**: Fast queries with limited retention
78+
- **Archive**: Slower queries but longer retention period
79+
80+
You can configure how HolmesGPT retrieves logs using the `logs_retrieval_methodology` setting:
81+
82+
### Available Strategies
83+
84+
- `ARCHIVE_FALLBACK` (default): Try Frequent Search first, fallback to Archive if no results
85+
- `FREQUENT_SEARCH_ONLY`: Only search Frequent Search tier
86+
- `ARCHIVE_ONLY`: Only search Archive tier
87+
- `BOTH_FREQUENT_SEARCH_AND_ARCHIVE`: Search both tiers and merge results
88+
- `FREQUENT_SEARCH_FALLBACK`: Try Archive first, fallback to Frequent Search if no results
89+
90+
### Example Configuration
91+
92+
```yaml-toolset-config
93+
toolsets:
94+
coralogix/logs:
95+
enabled: true
96+
config:
97+
api_key: "<your Coralogix API key>"
98+
domain: "eu2.coralogix.com"
99+
team_hostname: "your-company-name"
100+
logs_retrieval_methodology: "ARCHIVE_FALLBACK" # Default
101+
```
102+
103+
**Recommendations:**
104+
105+
- Use `ARCHIVE_FALLBACK` for most cases (balances speed and coverage)
106+
- Use `FREQUENT_SEARCH_ONLY` when you know Holmes does not need to access the log archive
107+
- Use `ARCHIVE_ONLY` if the frequent search logs are always empty
108+
- Use `BOTH_FREQUENT_SEARCH_AND_ARCHIVE` for comprehensive log coverage (slower)
109+
32110
## Capabilities
33111

34112
| Tool Name | Description |

docs/installation/python-installation.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ messages = build_initial_ask_messages(
4848
initial_user_prompt=question,
4949
file_paths=None,
5050
tool_executor=ai.tool_executor,
51-
investigation_id=ai.investigation_id,
5251
runbooks=config.get_runbook_catalog(),
5352
system_prompt_additions=None
5453
)
@@ -130,7 +129,6 @@ def main():
130129
initial_user_prompt=question,
131130
file_paths=None,
132131
tool_executor=ai.tool_executor,
133-
investigation_id=ai.investigation_id,
134132
runbooks=config.get_runbook_catalog(),
135133
system_prompt_additions=None
136134
)
@@ -224,7 +222,6 @@ def main():
224222
initial_user_prompt=first_question,
225223
file_paths=None,
226224
tool_executor=ai.tool_executor,
227-
investigation_id=ai.investigation_id,
228225
runbooks=config.get_runbook_catalog(),
229226
system_prompt_additions=None
230227
)

docs/overrides/main.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
{% block announce %}
44
<div class="md-banner">
55
<div class="md-banner__inner">
6-
🎉 Join us on our first HolmesGPT community meeting - August 21, 8AM PT
7-
<a href="/community/">Learn more</a>
6+
📹 Watch the recording of our first HolmesGPT community meetup
7+
<a href="https://youtu.be/slQRc6nlFQU" target="_blank">Watch on YouTube</a>
88
</div>
99
</div>
1010
{% endblock %}
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Investigating using AKS MCP Server
2+
3+
You can investigate Azure Kubernetes Service issues using HolmesGPT with the AKS MCP (Model Context Protocol) server.
4+
5+
![AKS MCP Integration](../assets/Holmes-azure-mcp.gif)
6+
7+
## Prerequisites
8+
9+
- HolmesGPT CLI installed ([installation guide](../installation/cli-installation.md))
10+
- An AI provider API key configured ([setup guide](../ai-providers/index.md))
11+
- Azure CLI installed and authenticated
12+
- Access to Azure Kubernetes Service clusters
13+
- [Azure Kubernetes Service](https://marketplace.visualstudio.com/items?itemName=ms-kubernetes-tools.vscode-aks-tools) VS Code extension installed
14+
15+
## Setting Up AKS MCP Server
16+
17+
### Step 1: Setup the MCP Server
18+
19+
- Open VS Code Command Palette (`Ctrl+Shift+P` or `Cmd+Shift+P`)
20+
- Run: **"AKS: Setup AKS MCP Server"**
21+
- Follow the setup wizard to configure your Azure credentials and cluster access
22+
23+
### Step 2: Update Configuration for SSE
24+
After installation, update your VS Code MCP configuration (`.vscode/mcp.json`) to use SSE transport and start the server
25+
```json
26+
{
27+
"servers": {
28+
"AKS MCP": {
29+
"command": "/Users/yourname/.vs-kubernetes/tools/aks-mcp/v0.0.3/aks-mcp",
30+
"args": [
31+
"--transport",
32+
"sse"
33+
]
34+
}
35+
}
36+
}
37+
```
38+
**Note:** Change `"stdio"` to `"sse"` in the transport argument.
39+
40+
### Step 3: Configure HolmesGPT
41+
42+
Add this configuration to your HolmesGPT config file (`~/.holmes/config.yaml`):
43+
44+
```yaml
45+
mcp_servers:
46+
aks-mcp:
47+
description: "Azure Kubernetes Service(AKS) Model Context Protocol(MCP) server"
48+
url: "http://localhost:8000/sse"
49+
llm_instructions: "MCP server to get AKS cluster information, retrieve cluster resources and workloads, analyze network policies and VNet configurations, query control plane logs, fetch cluster metrics and health status. Investigate networking issues with NSGs and load balancers, perform kubectl operations, real-time monitoring of DNS, services across Azure Kubernetes environments"
50+
51+
```
52+
53+
## Investigation Examples
54+
55+
Once configured, you can investigate AKS issues using natural language queries:
56+
57+
### Cluster Health Issues
58+
```bash
59+
holmes ask "What issues do I have in my AKS cluster?"
60+
```
61+
62+
### Network Connectivity Problems
63+
```bash
64+
holmes ask "My payment deployment can't reach external services investigate why"
65+
```
66+
67+
## What's Next?
68+
69+
- **[Add more data sources](../data-sources/index.md)** - Combine AKS MCP with other observability tools
70+
- **[Set up additional MCP servers](../data-sources/remote-mcp-servers.md)** - Integrate multiple specialized MCP servers
71+
- **[Configure custom toolsets](../data-sources/custom-toolsets.md)** - Create specialized investigation workflows

helm/holmes/templates/holmes.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@ metadata:
66
labels:
77
app: holmes
88
spec:
9-
replicas: 1
9+
{{- if (not .Values.autoscaling.enabled) }}
10+
replicas: {{ .Values.replicas }}
11+
{{- end }}
1012
revisionHistoryLimit: {{ .Values.revisionHistoryLimit }}
1113
selector:
1214
matchLabels:

0 commit comments

Comments
 (0)