Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
CLICKHOUSE_PASSWORD: ""
CLICKHOUSE_SECURE: "false"
CLICKHOUSE_VERIFY: "false"
CHDB_ENABLED: "true"
run: |
uv run pytest tests

Expand Down
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Build stage - Use a Python image with uv pre-installed
FROM ghcr.io/astral-sh/uv:python3.13-alpine AS builder
FROM ghcr.io/astral-sh/uv:python3.13-bookworm AS builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serprex This is because chDB does not currently support Alpine Linux as a runtime environment.


# Install the project into `/app`
WORKDIR /app
Expand All @@ -11,8 +11,8 @@ ENV UV_COMPILE_BYTECODE=1
ENV UV_LINK_MODE=copy

# Install git and build dependencies for ClickHouse client
RUN --mount=type=cache,target=/var/cache/apk \
apk add git build-base
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y git build-essential

# Install the project's dependencies using the lockfile and settings
RUN --mount=type=cache,target=/root/.cache/uv \
Expand All @@ -28,7 +28,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --locked --no-dev --no-editable

# Production stage - Use minimal Python image
FROM python:3.13-alpine
FROM python:3.13-bookworm

# Set the working directory
WORKDIR /app
Expand Down
109 changes: 105 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ An MCP server for ClickHouse.

## Features

### Tools
### ClickHouse Tools

* `run_select_query`
* Execute SQL queries on your ClickHouse cluster.
Expand All @@ -22,8 +22,17 @@ An MCP server for ClickHouse.
* List all tables in a database.
* Input: `database` (string): The name of the database.

### chDB Tools

* `run_chdb_select_query`
* Execute SQL queries using chDB's embedded OLAP engine.
* Input: `sql` (string): The SQL query to execute.
* Query data directly from various sources (files, URLs, databases) without ETL processes.

## Configuration

This MCP server supports both ClickHouse and chDB. You can enable either or both depending on your needs.

1. Open the Claude Desktop configuration file located at:
* On macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
* On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
Expand Down Expand Up @@ -90,6 +99,63 @@ Or, if you'd like to try it out with the [ClickHouse SQL Playground](https://sql
}
```

For chDB (embedded OLAP engine), add the following configuration:

```json
{
"mcpServers": {
"mcp-clickhouse": {
"command": "uv",
"args": [
"run",
"--with",
"mcp-clickhouse",
"--python",
"3.13",
"mcp-clickhouse"
],
"env": {
"CHDB_ENABLED": "true",
"CLICKHOUSE_ENABLED": "false",
"CHDB_DATA_PATH": "/path/to/chdb/data"
}
}
}
}
```

You can also enable both ClickHouse and chDB simultaneously:

```json
{
"mcpServers": {
"mcp-clickhouse": {
"command": "uv",
"args": [
"run",
"--with",
"mcp-clickhouse",
"--python",
"3.13",
"mcp-clickhouse"
],
"env": {
"CLICKHOUSE_HOST": "<clickhouse-host>",
"CLICKHOUSE_PORT": "<clickhouse-port>",
"CLICKHOUSE_USER": "<clickhouse-user>",
"CLICKHOUSE_PASSWORD": "<clickhouse-password>",
"CLICKHOUSE_SECURE": "true",
"CLICKHOUSE_VERIFY": "true",
"CLICKHOUSE_CONNECT_TIMEOUT": "30",
"CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30",
"CHDB_ENABLED": "true",
"CHDB_DATA_PATH": "/path/to/chdb/data"
}
}
}
}
```

3. Locate the command entry for `uv` and replace it with the absolute path to the `uv` executable. This ensures that the correct version of `uv` is used when starting the server. On a mac, you can find this path using `which uv`.

4. Restart Claude Desktop to apply the changes.
Expand All @@ -115,9 +181,11 @@ CLICKHOUSE_PASSWORD=clickhouse

### Environment Variables

The following environment variables are used to configure the ClickHouse connection:
The following environment variables are used to configure the ClickHouse and chDB connections:

#### Required Variables
#### ClickHouse Variables

##### Required Variables

* `CLICKHOUSE_HOST`: The hostname of your ClickHouse server
* `CLICKHOUSE_USER`: The username for authentication
Expand All @@ -126,7 +194,7 @@ The following environment variables are used to configure the ClickHouse connect
> [!CAUTION]
> It is important to treat your MCP database user as you would any external client connecting to your database, granting only the minimum necessary privileges required for its operation. The use of default or administrative users should be strictly avoided at all times.

#### Optional Variables
##### Optional Variables

* `CLICKHOUSE_PORT`: The port number of your ClickHouse server
* Default: `8443` if HTTPS is enabled, `8123` if disabled
Expand All @@ -149,6 +217,19 @@ The following environment variables are used to configure the ClickHouse connect
* `CLICKHOUSE_MCP_SERVER_TRANSPORT`: Sets the transport method for the MCP server.
* Default: `"stdio"`
* Valid options: `"stdio"`, `"http"`, `"streamable-http"`, `"sse"`. This is useful for local development with tools like MCP Inspector.
* `CLICKHOUSE_ENABLED`: Enable/disable ClickHouse functionality
* Default: `"true"`
* Set to `"false"` to disable ClickHouse tools when using chDB only

#### chDB Variables

* `CHDB_ENABLED`: Enable/disable chDB functionality
* Default: `"false"`
* Set to `"true"` to enable chDB tools
* `CHDB_DATA_PATH`: The path to the chDB data directory
* Default: `":memory:"` (in-memory database)
* Use `:memory:` for in-memory database
* Use a file path for persistent storage (e.g., `/path/to/chdb/data`)

#### Example Configurations

Expand Down Expand Up @@ -187,6 +268,24 @@ CLICKHOUSE_PASSWORD=
# Uses secure defaults (HTTPS on port 8443)
```

For chDB only (in-memory):

```env
# chDB configuration
CHDB_ENABLED=true
CLICKHOUSE_ENABLED=false
# CHDB_DATA_PATH defaults to :memory:
```

For chDB with persistent storage:

```env
# chDB configuration
CHDB_ENABLED=true
CLICKHOUSE_ENABLED=false
CHDB_DATA_PATH=/path/to/chdb/data
```

You can set these variables in your environment, in a `.env` file, or in the Claude Desktop configuration:

```json
Expand Down Expand Up @@ -221,6 +320,8 @@ uv run ruff check . # run linting

docker compose up -d test_services # start ClickHouse
uv run pytest -v tests
uv run pytest -v tests/test_tool.py # ClickHouse only
uv run pytest -v tests/test_chdb_tool.py # chDB only
```

## YouTube Overview
Expand Down
6 changes: 6 additions & 0 deletions mcp_clickhouse/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,17 @@
list_databases,
list_tables,
run_select_query,
create_chdb_client,
run_chdb_select_query,
chdb_initial_prompt,
)

__all__ = [
"list_databases",
"list_tables",
"run_select_query",
"create_clickhouse_client",
"create_chdb_client",
"run_chdb_select_query",
"chdb_initial_prompt",
]
119 changes: 119 additions & 0 deletions mcp_clickhouse/chdb_prompt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""chDB prompts for MCP server."""

CHDB_PROMPT = """
# chDB Assistant Guide

You are an expert chDB assistant designed to help users leverage chDB for querying diverse data sources. chDB is an in-process ClickHouse engine that excels at analytical queries through its extensive table function ecosystem.

## Available Tools
- **run_chdb_select_query**: Execute SELECT queries using chDB's table functions

## Table Functions: The Core of chDB

chDB's strength lies in its **table functions** - special functions that act as virtual tables, allowing you to query data from various sources without traditional ETL processes. Each table function is optimized for specific data sources and formats.

### File-Based Table Functions

#### **file() Function**
Query local files directly with automatic format detection:
```sql
-- Auto-detect format
SELECT * FROM file('/path/to/data.parquet');
SELECT * FROM file('sales.csv');

-- Explicit format specification
SELECT * FROM file('data.csv', 'CSV');
SELECT * FROM file('logs.json', 'JSONEachRow');
SELECT * FROM file('export.tsv', 'TSV');
```

### Remote Data Table Functions

#### **url() Function**
Access remote data over HTTP/HTTPS:
```sql
-- Query CSV from URL
SELECT * FROM url('https://example.com/data.csv', 'CSV');

-- Query parquet from URL
SELECT * FROM url('https://data.example.com/logs/data.parquet');
```

#### **s3() Function**
Direct S3 data access:
```sql
-- Single S3 file
SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', 'CSVWithNames');

-- S3 with credentials and wildcard patterns
SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv', '<KEY>', '<SECRET>','TSVWithNames')
```

#### **hdfs() Function**
Hadoop Distributed File System access:
```sql
-- HDFS file access
SELECT * FROM hdfs('hdfs://namenode:9000/data/events.parquet');

-- HDFS directory scan
SELECT * FROM hdfs('hdfs://cluster/warehouse/table/*', 'TSV');
```

### Database Table Functions

#### **sqlite() Function**
Query SQLite databases:
```sql
-- Access SQLite table
SELECT * FROM sqlite('/path/to/database.db', 'users');

-- Join with other data
SELECT u.name, s.amount
FROM sqlite('app.db', 'users') u
JOIN file('sales.csv') s ON u.id = s.user_id;
```

#### **postgresql() Function**
Connect to PostgreSQL:
```sql
-- PostgreSQL table access
SELECT * FROM postgresql('localhost:5432', 'mydb', 'orders', 'user', 'password');
```

#### **mysql() Function**
MySQL database integration:
```sql
-- MySQL table query
SELECT * FROM mysql('localhost:3306', 'shop', 'products', 'user', 'password');
```

## Table Function Best Practices

### **Performance Optimization**
- **Predicate Pushdown**: Apply filters early to reduce data transfer
- **Column Pruning**: Select only needed columns

### **Error Handling**
- Test table function connectivity with `LIMIT 1`
- Verify data formats match function expectations
- Use `DESCRIBE` to understand schema before complex queries

## Workflow with Table Functions

1. **Identify Data Source**: Choose appropriate table function
2. **Test Connection**: Use simple `SELECT * LIMIT 1` queries
3. **Explore Schema**: Use `DESCRIBE table_function(...)`
4. **Build Query**: Combine table functions as needed
5. **Optimize**: Apply filters and column selection

## Getting Started

When helping users:
1. **Identify their data source type** and recommend the appropriate table function
2. **Show table function syntax** with their specific parameters
3. **Demonstrate data exploration** using the table function
4. **Build analytical queries** combining multiple table functions if needed
5. **Optimize performance** through proper filtering and column selection

Remember: chDB's table functions eliminate the need for data loading - you can query data directly from its source, making analytics faster and more flexible.
"""
Loading