Skip to content

Commit 26fe64e

Browse files
committed
feat: Add comprehensive chDB support to MCP ClickHouse server
- Add chDB embedded OLAP engine integration - Implement run_chdb_select_query tool - Add new configurations (CHDB_ENABLED, CHDB_DATA_PATH) - Create comprehensive chDB prompt - Enable hybrid deployments with independent ClickHouse/chDB operation
1 parent 19b4e7d commit 26fe64e

File tree

9 files changed

+770
-178
lines changed

9 files changed

+770
-178
lines changed

.github/workflows/ci.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,15 @@ jobs:
3030
- name: Install Project
3131
run: uv sync --all-extras --dev
3232

33-
- name: Run tests
33+
- name: Run chDB tests
34+
env:
35+
CHDB_ENABLED: "true"
36+
CLICKHOUSE_ENABLED: "false"
37+
CHDB_DATA_PATH: ":memory:"
38+
run: |
39+
uv run pytest tests/test_chdb_tool.py
40+
41+
- name: Run ClickHouse tests
3442
env:
3543
CLICKHOUSE_HOST: "localhost"
3644
CLICKHOUSE_PORT: "8123"
@@ -39,7 +47,7 @@ jobs:
3947
CLICKHOUSE_SECURE: "false"
4048
CLICKHOUSE_VERIFY: "false"
4149
run: |
42-
uv run pytest tests
50+
uv run pytest tests/test_tool.py
4351
4452
- name: Lint with Ruff
4553
run: uv run ruff check .

README.md

Lines changed: 101 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ An MCP server for ClickHouse.
88

99
## Features
1010

11-
### Tools
11+
### ClickHouse Tools
1212

1313
* `run_select_query`
1414
* Execute SQL queries on your ClickHouse cluster.
@@ -22,8 +22,17 @@ An MCP server for ClickHouse.
2222
* List all tables in a database.
2323
* Input: `database` (string): The name of the database.
2424

25+
### chDB Tools
26+
27+
* `run_chdb_select_query`
28+
* Execute SQL queries using chDB's embedded OLAP engine.
29+
* Input: `sql` (string): The SQL query to execute.
30+
* Query data directly from various sources (files, URLs, databases) without ETL processes.
31+
2532
## Configuration
2633

34+
This MCP server supports both ClickHouse and chDB. You can enable either or both depending on your needs.
35+
2736
1. Open the Claude Desktop configuration file located at:
2837
* On macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
2938
* On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
@@ -90,6 +99,63 @@ Or, if you'd like to try it out with the [ClickHouse SQL Playground](https://sql
9099
}
91100
```
92101

102+
For chDB (embedded OLAP engine), add the following configuration:
103+
104+
```json
105+
{
106+
"mcpServers": {
107+
"mcp-clickhouse": {
108+
"command": "uv",
109+
"args": [
110+
"run",
111+
"--with",
112+
"mcp-clickhouse",
113+
"--python",
114+
"3.13",
115+
"mcp-clickhouse"
116+
],
117+
"env": {
118+
"CHDB_ENABLED": "true",
119+
"CLICKHOUSE_ENABLED": "false",
120+
"CHDB_DATA_PATH": "/path/to/chdb/data"
121+
}
122+
}
123+
}
124+
}
125+
```
126+
127+
You can also enable both ClickHouse and chDB simultaneously:
128+
129+
```json
130+
{
131+
"mcpServers": {
132+
"mcp-clickhouse": {
133+
"command": "uv",
134+
"args": [
135+
"run",
136+
"--with",
137+
"mcp-clickhouse",
138+
"--python",
139+
"3.13",
140+
"mcp-clickhouse"
141+
],
142+
"env": {
143+
"CLICKHOUSE_HOST": "<clickhouse-host>",
144+
"CLICKHOUSE_PORT": "<clickhouse-port>",
145+
"CLICKHOUSE_USER": "<clickhouse-user>",
146+
"CLICKHOUSE_PASSWORD": "<clickhouse-password>",
147+
"CLICKHOUSE_SECURE": "true",
148+
"CLICKHOUSE_VERIFY": "true",
149+
"CLICKHOUSE_CONNECT_TIMEOUT": "30",
150+
"CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30",
151+
"CHDB_ENABLED": "true",
152+
"CHDB_DATA_PATH": "/path/to/chdb/data"
153+
}
154+
}
155+
}
156+
}
157+
```
158+
93159
3. Locate the command entry for `uv` and replace it with the absolute path to the `uv` executable. This ensures that the correct version of `uv` is used when starting the server. On a mac, you can find this path using `which uv`.
94160

95161
4. Restart Claude Desktop to apply the changes.
@@ -115,7 +181,22 @@ CLICKHOUSE_PASSWORD=clickhouse
115181

116182
### Environment Variables
117183

118-
The following environment variables are used to configure the ClickHouse connection:
184+
The following environment variables are used to configure the ClickHouse and chDB connections:
185+
186+
#### chDB Variables
187+
188+
* `CHDB_ENABLED`: Enable/disable chDB functionality
189+
* Default: `"false"`
190+
* Set to `"true"` to enable chDB tools
191+
* `CHDB_DATA_PATH`: The path to the chDB data directory
192+
* Required when `CHDB_ENABLED=true`
193+
* Use `:memory:` for in-memory database (recommended for testing)
194+
* Use a file path for persistent storage (e.g., `/path/to/chdb/data`)
195+
* `CLICKHOUSE_ENABLED`: Enable/disable ClickHouse functionality
196+
* Default: `"true"`
197+
* Set to `"false"` to disable ClickHouse tools when using chDB only
198+
199+
#### ClickHouse Variables
119200

120201
#### Required Variables
121202

@@ -184,6 +265,24 @@ CLICKHOUSE_PASSWORD=
184265
# Uses secure defaults (HTTPS on port 8443)
185266
```
186267

268+
For chDB only (in-memory):
269+
270+
```env
271+
# chDB configuration
272+
CHDB_ENABLED=true
273+
CLICKHOUSE_ENABLED=false
274+
CHDB_DATA_PATH=:memory:
275+
```
276+
277+
For chDB with persistent storage:
278+
279+
```env
280+
# chDB configuration
281+
CHDB_ENABLED=true
282+
CLICKHOUSE_ENABLED=false
283+
CHDB_DATA_PATH=/path/to/chdb/data
284+
```
285+
187286
You can set these variables in your environment, in a `.env` file, or in the Claude Desktop configuration:
188287

189288
```json

mcp_clickhouse/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,17 @@
33
list_databases,
44
list_tables,
55
run_select_query,
6+
create_chdb_client,
7+
run_chdb_select_query,
8+
chdb_initial_prompt,
69
)
710

811
__all__ = [
912
"list_databases",
1013
"list_tables",
1114
"run_select_query",
1215
"create_clickhouse_client",
16+
"create_chdb_client",
17+
"run_chdb_select_query",
18+
"chdb_initial_prompt",
1319
]

mcp_clickhouse/chdb_prompt.py

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
"""chDB prompts for MCP server."""
2+
3+
CHDB_PROMPT = """
4+
# chDB Assistant Guide
5+
6+
You are an expert chDB assistant designed to help users leverage chDB for querying diverse data sources. chDB is an embedded SQL OLAP engine that excels at analytical queries through its extensive table function ecosystem.
7+
8+
## Available Tools
9+
- **run_chdb_select_query**: Execute SELECT queries using chDB's table functions
10+
11+
## Table Functions: The Core of chDB
12+
13+
chDB's strength lies in its **table functions** - special functions that act as virtual tables, allowing you to query data from various sources without traditional ETL processes. Each table function is optimized for specific data sources and formats.
14+
15+
### File-Based Table Functions
16+
17+
#### **file() Function**
18+
Query local files directly with automatic format detection:
19+
```sql
20+
-- Auto-detect format
21+
SELECT * FROM file('/path/to/data.parquet');
22+
SELECT * FROM file('sales.csv');
23+
24+
-- Explicit format specification
25+
SELECT * FROM file('data.csv', 'CSV');
26+
SELECT * FROM file('logs.json', 'JSONEachRow');
27+
SELECT * FROM file('export.tsv', 'TSV');
28+
```
29+
30+
### Remote Data Table Functions
31+
32+
#### **url() Function**
33+
Access remote data over HTTP/HTTPS:
34+
```sql
35+
-- Query CSV from URL
36+
SELECT * FROM url('https://example.com/data.csv', 'CSV');
37+
38+
-- Query parquet from URL
39+
SELECT * FROM url('https://data.example.com/logs/data.parquet');
40+
```
41+
42+
#### **s3() Function**
43+
Direct S3 data access:
44+
```sql
45+
-- Single S3 file
46+
SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', 'CSVWithNames');
47+
48+
-- S3 with credentials and wildcard patterns
49+
SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv', '<KEY>', '<SECRET>','TSVWithNames')
50+
```
51+
52+
#### **hdfs() Function**
53+
Hadoop Distributed File System access:
54+
```sql
55+
-- HDFS file access
56+
SELECT * FROM hdfs('hdfs://namenode:9000/data/events.parquet');
57+
58+
-- HDFS directory scan
59+
SELECT * FROM hdfs('hdfs://cluster/warehouse/table/*', 'TSV');
60+
```
61+
62+
### Database Table Functions
63+
64+
#### **sqlite() Function**
65+
Query SQLite databases:
66+
```sql
67+
-- Access SQLite table
68+
SELECT * FROM sqlite('/path/to/database.db', 'users');
69+
70+
-- Join with other data
71+
SELECT u.name, s.amount
72+
FROM sqlite('app.db', 'users') u
73+
JOIN file('sales.csv') s ON u.id = s.user_id;
74+
```
75+
76+
#### **postgresql() Function**
77+
Connect to PostgreSQL:
78+
```sql
79+
-- PostgreSQL table access
80+
SELECT * FROM postgresql('localhost:5432', 'mydb', 'orders', 'user', 'password');
81+
```
82+
83+
#### **mysql() Function**
84+
MySQL database integration:
85+
```sql
86+
-- MySQL table query
87+
SELECT * FROM mysql('localhost:3306', 'shop', 'products', 'user', 'password');
88+
```
89+
90+
## Table Function Best Practices
91+
92+
### **Performance Optimization**
93+
- **Predicate Pushdown**: Apply filters early to reduce data transfer
94+
- **Column Pruning**: Select only needed columns
95+
96+
### **Error Handling**
97+
- Test table function connectivity with `LIMIT 1`
98+
- Verify data formats match function expectations
99+
- Use `DESCRIBE` to understand schema before complex queries
100+
101+
## Workflow with Table Functions
102+
103+
1. **Identify Data Source**: Choose appropriate table function
104+
2. **Test Connection**: Use simple `SELECT * LIMIT 1` queries
105+
3. **Explore Schema**: Use `DESCRIBE table_function(...)`
106+
4. **Build Query**: Combine table functions as needed
107+
5. **Optimize**: Apply filters and column selection
108+
109+
## Getting Started
110+
111+
When helping users:
112+
1. **Identify their data source type** and recommend the appropriate table function
113+
2. **Show table function syntax** with their specific parameters
114+
3. **Demonstrate data exploration** using the table function
115+
4. **Build analytical queries** combining multiple table functions if needed
116+
5. **Optimize performance** through proper filtering and column selection
117+
118+
Remember: chDB's table functions eliminate the need for data loading - you can query data directly from its source, making analytics faster and more flexible.
119+
"""

0 commit comments

Comments
 (0)