Skip to content

Commit c0af32c

Browse files
authored
Add chDB Support to MCP ClickHouse Server (#51)
1 parent edc9fe5 commit c0af32c

File tree

11 files changed

+1050
-315
lines changed

11 files changed

+1050
-315
lines changed

.github/workflows/ci.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ jobs:
3838
CLICKHOUSE_PASSWORD: ""
3939
CLICKHOUSE_SECURE: "false"
4040
CLICKHOUSE_VERIFY: "false"
41+
CHDB_ENABLED: "true"
4142
run: |
4243
uv run pytest tests
4344

Dockerfile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Build stage - Use a Python image with uv pre-installed
2-
FROM ghcr.io/astral-sh/uv:python3.13-alpine AS builder
2+
FROM ghcr.io/astral-sh/uv:python3.13-bookworm AS builder
33

44
# Install the project into `/app`
55
WORKDIR /app
@@ -11,8 +11,8 @@ ENV UV_COMPILE_BYTECODE=1
1111
ENV UV_LINK_MODE=copy
1212

1313
# Install git and build dependencies for ClickHouse client
14-
RUN --mount=type=cache,target=/var/cache/apk \
15-
apk add git build-base
14+
RUN --mount=type=cache,target=/var/cache/apt \
15+
apt-get update && apt-get install -y git build-essential
1616

1717
# Install the project's dependencies using the lockfile and settings
1818
RUN --mount=type=cache,target=/root/.cache/uv \
@@ -28,7 +28,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
2828
uv sync --locked --no-dev --no-editable
2929

3030
# Production stage - Use minimal Python image
31-
FROM python:3.13-alpine
31+
FROM python:3.13-bookworm
3232

3333
# Set the working directory
3434
WORKDIR /app

README.md

Lines changed: 105 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ An MCP server for ClickHouse.
88

99
## Features
1010

11-
### Tools
11+
### ClickHouse Tools
1212

1313
* `run_select_query`
1414
* Execute SQL queries on your ClickHouse cluster.
@@ -22,8 +22,17 @@ An MCP server for ClickHouse.
2222
* List all tables in a database.
2323
* Input: `database` (string): The name of the database.
2424

25+
### chDB Tools
26+
27+
* `run_chdb_select_query`
28+
* Execute SQL queries using chDB's embedded OLAP engine.
29+
* Input: `sql` (string): The SQL query to execute.
30+
* Query data directly from various sources (files, URLs, databases) without ETL processes.
31+
2532
## Configuration
2633

34+
This MCP server supports both ClickHouse and chDB. You can enable either or both depending on your needs.
35+
2736
1. Open the Claude Desktop configuration file located at:
2837
* On macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
2938
* On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
@@ -90,6 +99,63 @@ Or, if you'd like to try it out with the [ClickHouse SQL Playground](https://sql
9099
}
91100
```
92101

102+
For chDB (embedded OLAP engine), add the following configuration:
103+
104+
```json
105+
{
106+
"mcpServers": {
107+
"mcp-clickhouse": {
108+
"command": "uv",
109+
"args": [
110+
"run",
111+
"--with",
112+
"mcp-clickhouse",
113+
"--python",
114+
"3.13",
115+
"mcp-clickhouse"
116+
],
117+
"env": {
118+
"CHDB_ENABLED": "true",
119+
"CLICKHOUSE_ENABLED": "false",
120+
"CHDB_DATA_PATH": "/path/to/chdb/data"
121+
}
122+
}
123+
}
124+
}
125+
```
126+
127+
You can also enable both ClickHouse and chDB simultaneously:
128+
129+
```json
130+
{
131+
"mcpServers": {
132+
"mcp-clickhouse": {
133+
"command": "uv",
134+
"args": [
135+
"run",
136+
"--with",
137+
"mcp-clickhouse",
138+
"--python",
139+
"3.13",
140+
"mcp-clickhouse"
141+
],
142+
"env": {
143+
"CLICKHOUSE_HOST": "<clickhouse-host>",
144+
"CLICKHOUSE_PORT": "<clickhouse-port>",
145+
"CLICKHOUSE_USER": "<clickhouse-user>",
146+
"CLICKHOUSE_PASSWORD": "<clickhouse-password>",
147+
"CLICKHOUSE_SECURE": "true",
148+
"CLICKHOUSE_VERIFY": "true",
149+
"CLICKHOUSE_CONNECT_TIMEOUT": "30",
150+
"CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30",
151+
"CHDB_ENABLED": "true",
152+
"CHDB_DATA_PATH": "/path/to/chdb/data"
153+
}
154+
}
155+
}
156+
}
157+
```
158+
93159
3. Locate the command entry for `uv` and replace it with the absolute path to the `uv` executable. This ensures that the correct version of `uv` is used when starting the server. On a mac, you can find this path using `which uv`.
94160

95161
4. Restart Claude Desktop to apply the changes.
@@ -115,9 +181,11 @@ CLICKHOUSE_PASSWORD=clickhouse
115181

116182
### Environment Variables
117183

118-
The following environment variables are used to configure the ClickHouse connection:
184+
The following environment variables are used to configure the ClickHouse and chDB connections:
119185

120-
#### Required Variables
186+
#### ClickHouse Variables
187+
188+
##### Required Variables
121189

122190
* `CLICKHOUSE_HOST`: The hostname of your ClickHouse server
123191
* `CLICKHOUSE_USER`: The username for authentication
@@ -126,7 +194,7 @@ The following environment variables are used to configure the ClickHouse connect
126194
> [!CAUTION]
127195
> It is important to treat your MCP database user as you would any external client connecting to your database, granting only the minimum necessary privileges required for its operation. The use of default or administrative users should be strictly avoided at all times.
128196
129-
#### Optional Variables
197+
##### Optional Variables
130198

131199
* `CLICKHOUSE_PORT`: The port number of your ClickHouse server
132200
* Default: `8443` if HTTPS is enabled, `8123` if disabled
@@ -149,6 +217,19 @@ The following environment variables are used to configure the ClickHouse connect
149217
* `CLICKHOUSE_MCP_SERVER_TRANSPORT`: Sets the transport method for the MCP server.
150218
* Default: `"stdio"`
151219
* Valid options: `"stdio"`, `"http"`, `"streamable-http"`, `"sse"`. This is useful for local development with tools like MCP Inspector.
220+
* `CLICKHOUSE_ENABLED`: Enable/disable ClickHouse functionality
221+
* Default: `"true"`
222+
* Set to `"false"` to disable ClickHouse tools when using chDB only
223+
224+
#### chDB Variables
225+
226+
* `CHDB_ENABLED`: Enable/disable chDB functionality
227+
* Default: `"false"`
228+
* Set to `"true"` to enable chDB tools
229+
* `CHDB_DATA_PATH`: The path to the chDB data directory
230+
* Default: `":memory:"` (in-memory database)
231+
* Use `:memory:` for in-memory database
232+
* Use a file path for persistent storage (e.g., `/path/to/chdb/data`)
152233

153234
#### Example Configurations
154235

@@ -187,6 +268,24 @@ CLICKHOUSE_PASSWORD=
187268
# Uses secure defaults (HTTPS on port 8443)
188269
```
189270

271+
For chDB only (in-memory):
272+
273+
```env
274+
# chDB configuration
275+
CHDB_ENABLED=true
276+
CLICKHOUSE_ENABLED=false
277+
# CHDB_DATA_PATH defaults to :memory:
278+
```
279+
280+
For chDB with persistent storage:
281+
282+
```env
283+
# chDB configuration
284+
CHDB_ENABLED=true
285+
CLICKHOUSE_ENABLED=false
286+
CHDB_DATA_PATH=/path/to/chdb/data
287+
```
288+
190289
You can set these variables in your environment, in a `.env` file, or in the Claude Desktop configuration:
191290

192291
```json
@@ -221,6 +320,8 @@ uv run ruff check . # run linting
221320

222321
docker compose up -d test_services # start ClickHouse
223322
uv run pytest -v tests
323+
uv run pytest -v tests/test_tool.py # ClickHouse only
324+
uv run pytest -v tests/test_chdb_tool.py # chDB only
224325
```
225326

226327
## YouTube Overview

mcp_clickhouse/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,17 @@
33
list_databases,
44
list_tables,
55
run_select_query,
6+
create_chdb_client,
7+
run_chdb_select_query,
8+
chdb_initial_prompt,
69
)
710

811
__all__ = [
912
"list_databases",
1013
"list_tables",
1114
"run_select_query",
1215
"create_clickhouse_client",
16+
"create_chdb_client",
17+
"run_chdb_select_query",
18+
"chdb_initial_prompt",
1319
]

mcp_clickhouse/chdb_prompt.py

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
"""chDB prompts for MCP server."""
2+
3+
CHDB_PROMPT = """
4+
# chDB Assistant Guide
5+
6+
You are an expert chDB assistant designed to help users leverage chDB for querying diverse data sources. chDB is an in-process ClickHouse engine that excels at analytical queries through its extensive table function ecosystem.
7+
8+
## Available Tools
9+
- **run_chdb_select_query**: Execute SELECT queries using chDB's table functions
10+
11+
## Table Functions: The Core of chDB
12+
13+
chDB's strength lies in its **table functions** - special functions that act as virtual tables, allowing you to query data from various sources without traditional ETL processes. Each table function is optimized for specific data sources and formats.
14+
15+
### File-Based Table Functions
16+
17+
#### **file() Function**
18+
Query local files directly with automatic format detection:
19+
```sql
20+
-- Auto-detect format
21+
SELECT * FROM file('/path/to/data.parquet');
22+
SELECT * FROM file('sales.csv');
23+
24+
-- Explicit format specification
25+
SELECT * FROM file('data.csv', 'CSV');
26+
SELECT * FROM file('logs.json', 'JSONEachRow');
27+
SELECT * FROM file('export.tsv', 'TSV');
28+
```
29+
30+
### Remote Data Table Functions
31+
32+
#### **url() Function**
33+
Access remote data over HTTP/HTTPS:
34+
```sql
35+
-- Query CSV from URL
36+
SELECT * FROM url('https://example.com/data.csv', 'CSV');
37+
38+
-- Query parquet from URL
39+
SELECT * FROM url('https://data.example.com/logs/data.parquet');
40+
```
41+
42+
#### **s3() Function**
43+
Direct S3 data access:
44+
```sql
45+
-- Single S3 file
46+
SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', 'CSVWithNames');
47+
48+
-- S3 with credentials and wildcard patterns
49+
SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv', '<KEY>', '<SECRET>','TSVWithNames')
50+
```
51+
52+
#### **hdfs() Function**
53+
Hadoop Distributed File System access:
54+
```sql
55+
-- HDFS file access
56+
SELECT * FROM hdfs('hdfs://namenode:9000/data/events.parquet');
57+
58+
-- HDFS directory scan
59+
SELECT * FROM hdfs('hdfs://cluster/warehouse/table/*', 'TSV');
60+
```
61+
62+
### Database Table Functions
63+
64+
#### **sqlite() Function**
65+
Query SQLite databases:
66+
```sql
67+
-- Access SQLite table
68+
SELECT * FROM sqlite('/path/to/database.db', 'users');
69+
70+
-- Join with other data
71+
SELECT u.name, s.amount
72+
FROM sqlite('app.db', 'users') u
73+
JOIN file('sales.csv') s ON u.id = s.user_id;
74+
```
75+
76+
#### **postgresql() Function**
77+
Connect to PostgreSQL:
78+
```sql
79+
-- PostgreSQL table access
80+
SELECT * FROM postgresql('localhost:5432', 'mydb', 'orders', 'user', 'password');
81+
```
82+
83+
#### **mysql() Function**
84+
MySQL database integration:
85+
```sql
86+
-- MySQL table query
87+
SELECT * FROM mysql('localhost:3306', 'shop', 'products', 'user', 'password');
88+
```
89+
90+
## Table Function Best Practices
91+
92+
### **Performance Optimization**
93+
- **Predicate Pushdown**: Apply filters early to reduce data transfer
94+
- **Column Pruning**: Select only needed columns
95+
96+
### **Error Handling**
97+
- Test table function connectivity with `LIMIT 1`
98+
- Verify data formats match function expectations
99+
- Use `DESCRIBE` to understand schema before complex queries
100+
101+
## Workflow with Table Functions
102+
103+
1. **Identify Data Source**: Choose appropriate table function
104+
2. **Test Connection**: Use simple `SELECT * LIMIT 1` queries
105+
3. **Explore Schema**: Use `DESCRIBE table_function(...)`
106+
4. **Build Query**: Combine table functions as needed
107+
5. **Optimize**: Apply filters and column selection
108+
109+
## Getting Started
110+
111+
When helping users:
112+
1. **Identify their data source type** and recommend the appropriate table function
113+
2. **Show table function syntax** with their specific parameters
114+
3. **Demonstrate data exploration** using the table function
115+
4. **Build analytical queries** combining multiple table functions if needed
116+
5. **Optimize performance** through proper filtering and column selection
117+
118+
Remember: chDB's table functions eliminate the need for data loading - you can query data directly from its source, making analytics faster and more flexible.
119+
"""

0 commit comments

Comments
 (0)