ClickHouse · iskakaushik · Jun 25, 2025 · Jun 8, 2025 · Jun 10, 2025 · Jun 10, 2025
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -38,6 +38,7 @@ jobs:
           CLICKHOUSE_PASSWORD: ""
           CLICKHOUSE_SECURE: "false"
           CLICKHOUSE_VERIFY: "false"
+          CHDB_ENABLED: "true"
         run: |
           uv run pytest tests
 

diff --git a/Dockerfile b/Dockerfile
@@ -1,5 +1,5 @@
 # Build stage - Use a Python image with uv pre-installed
-FROM ghcr.io/astral-sh/uv:python3.13-alpine AS builder
+FROM ghcr.io/astral-sh/uv:python3.13-bookworm AS builder
 
 # Install the project into `/app`
 WORKDIR /app
@@ -11,8 +11,8 @@ ENV UV_COMPILE_BYTECODE=1
 ENV UV_LINK_MODE=copy
 
 # Install git and build dependencies for ClickHouse client
-RUN --mount=type=cache,target=/var/cache/apk \
-    apk add git build-base
+RUN --mount=type=cache,target=/var/cache/apt \
+    apt-get update && apt-get install -y git build-essential
 
 # Install the project's dependencies using the lockfile and settings
 RUN --mount=type=cache,target=/root/.cache/uv \
@@ -28,7 +28,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
     uv sync --locked --no-dev --no-editable
 
 # Production stage - Use minimal Python image
-FROM python:3.13-alpine
+FROM python:3.13-bookworm
 
 # Set the working directory
 WORKDIR /app

diff --git a/README.md b/README.md
@@ -8,7 +8,7 @@ An MCP server for ClickHouse.
 
 ## Features
 
-### Tools
+### ClickHouse Tools
 
 * `run_select_query`
   * Execute SQL queries on your ClickHouse cluster.
@@ -22,8 +22,17 @@ An MCP server for ClickHouse.
   * List all tables in a database.
   * Input: `database` (string): The name of the database.
 
+### chDB Tools
+
+* `run_chdb_select_query`
+  * Execute SQL queries using chDB's embedded OLAP engine.
+  * Input: `sql` (string): The SQL query to execute.
+  * Query data directly from various sources (files, URLs, databases) without ETL processes.
+
 ## Configuration
 
+This MCP server supports both ClickHouse and chDB. You can enable either or both depending on your needs.
+
 1. Open the Claude Desktop configuration file located at:
    * On macOS: `~/Library/Application Support/Claude/claude_desktop_config.json`
    * On Windows: `%APPDATA%/Claude/claude_desktop_config.json`
@@ -90,6 +99,63 @@ Or, if you'd like to try it out with the [ClickHouse SQL Playground](https://sql
 }
 ```
 
+For chDB (embedded OLAP engine), add the following configuration:
+
+```json
+{
+  "mcpServers": {
+    "mcp-clickhouse": {
+      "command": "uv",
+      "args": [
+        "run",
+        "--with",
+        "mcp-clickhouse",
+        "--python",
+        "3.13",
+        "mcp-clickhouse"
+      ],
+      "env": {
+        "CHDB_ENABLED": "true",
+        "CLICKHOUSE_ENABLED": "false",
+        "CHDB_DATA_PATH": "/path/to/chdb/data"
+      }
+    }
+  }
+}
+```
+
+You can also enable both ClickHouse and chDB simultaneously:
+
+```json
+{
+  "mcpServers": {
+    "mcp-clickhouse": {
+      "command": "uv",
+      "args": [
+        "run",
+        "--with",
+        "mcp-clickhouse",
+        "--python",
+        "3.13",
+        "mcp-clickhouse"
+      ],
+      "env": {
+        "CLICKHOUSE_HOST": "<clickhouse-host>",
+        "CLICKHOUSE_PORT": "<clickhouse-port>",
+        "CLICKHOUSE_USER": "<clickhouse-user>",
+        "CLICKHOUSE_PASSWORD": "<clickhouse-password>",
+        "CLICKHOUSE_SECURE": "true",
+        "CLICKHOUSE_VERIFY": "true",
+        "CLICKHOUSE_CONNECT_TIMEOUT": "30",
+        "CLICKHOUSE_SEND_RECEIVE_TIMEOUT": "30",
+        "CHDB_ENABLED": "true",
+        "CHDB_DATA_PATH": "/path/to/chdb/data"
+      }
+    }
+  }
+}
+```
+
 3. Locate the command entry for `uv` and replace it with the absolute path to the `uv` executable. This ensures that the correct version of `uv` is used when starting the server. On a mac, you can find this path using `which uv`.
 
 4. Restart Claude Desktop to apply the changes.
@@ -115,9 +181,11 @@ CLICKHOUSE_PASSWORD=clickhouse
 
 ### Environment Variables
 
-The following environment variables are used to configure the ClickHouse connection:
+The following environment variables are used to configure the ClickHouse and chDB connections:
 
-#### Required Variables
+#### ClickHouse Variables
+
+##### Required Variables
 
 * `CLICKHOUSE_HOST`: The hostname of your ClickHouse server
 * `CLICKHOUSE_USER`: The username for authentication
@@ -126,7 +194,7 @@ The following environment variables are used to configure the ClickHouse connect
 > [!CAUTION]
 > It is important to treat your MCP database user as you would any external client connecting to your database, granting only the minimum necessary privileges required for its operation. The use of default or administrative users should be strictly avoided at all times.
 
-#### Optional Variables
+##### Optional Variables
 
 * `CLICKHOUSE_PORT`: The port number of your ClickHouse server
   * Default: `8443` if HTTPS is enabled, `8123` if disabled
@@ -149,6 +217,19 @@ The following environment variables are used to configure the ClickHouse connect
 * `CLICKHOUSE_MCP_SERVER_TRANSPORT`: Sets the transport method for the MCP server.
   * Default: `"stdio"`
   * Valid options: `"stdio"`, `"http"`, `"streamable-http"`, `"sse"`. This is useful for local development with tools like MCP Inspector.
+* `CLICKHOUSE_ENABLED`: Enable/disable ClickHouse functionality
+  * Default: `"true"`
+  * Set to `"false"` to disable ClickHouse tools when using chDB only
+
+#### chDB Variables
+
+* `CHDB_ENABLED`: Enable/disable chDB functionality
+  * Default: `"false"`
+  * Set to `"true"` to enable chDB tools
+* `CHDB_DATA_PATH`: The path to the chDB data directory
+  * Default: `":memory:"` (in-memory database) 
+  * Use `:memory:` for in-memory database
+  * Use a file path for persistent storage (e.g., `/path/to/chdb/data`)
 
 #### Example Configurations
 
@@ -187,6 +268,24 @@ CLICKHOUSE_PASSWORD=
 # Uses secure defaults (HTTPS on port 8443)
 ```
 
+For chDB only (in-memory):
+
+```env
+# chDB configuration
+CHDB_ENABLED=true
+CLICKHOUSE_ENABLED=false
+# CHDB_DATA_PATH defaults to :memory:
+```
+
+For chDB with persistent storage:
+
+```env
+# chDB configuration
+CHDB_ENABLED=true
+CLICKHOUSE_ENABLED=false
+CHDB_DATA_PATH=/path/to/chdb/data
+```
+
 You can set these variables in your environment, in a `.env` file, or in the Claude Desktop configuration:
 
 ```json
@@ -221,6 +320,8 @@ uv run ruff check . # run linting
 
 docker compose up -d test_services # start ClickHouse
 uv run pytest -v tests
+uv run pytest -v tests/test_tool.py # ClickHouse only
+uv run pytest -v tests/test_chdb_tool.py # chDB only
 ```
 
 ## YouTube Overview

diff --git a/mcp_clickhouse/__init__.py b/mcp_clickhouse/__init__.py
@@ -3,11 +3,17 @@
     list_databases,
     list_tables,
     run_select_query,
+    create_chdb_client,
+    run_chdb_select_query,
+    chdb_initial_prompt,
 )
 
 __all__ = [
     "list_databases",
     "list_tables",
     "run_select_query",
     "create_clickhouse_client",
+    "create_chdb_client",
+    "run_chdb_select_query",
+    "chdb_initial_prompt",
 ]
diff --git a/mcp_clickhouse/chdb_prompt.py b/mcp_clickhouse/chdb_prompt.py
@@ -0,0 +1,119 @@
+"""chDB prompts for MCP server."""
+
+CHDB_PROMPT = """
+# chDB Assistant Guide
+
+You are an expert chDB assistant designed to help users leverage chDB for querying diverse data sources. chDB is an in-process ClickHouse engine that excels at analytical queries through its extensive table function ecosystem.
+
+## Available Tools
+- **run_chdb_select_query**: Execute SELECT queries using chDB's table functions
+
+## Table Functions: The Core of chDB
+
+chDB's strength lies in its **table functions** - special functions that act as virtual tables, allowing you to query data from various sources without traditional ETL processes. Each table function is optimized for specific data sources and formats.
+
+### File-Based Table Functions
+
+#### **file() Function**
+Query local files directly with automatic format detection:
+```sql
+-- Auto-detect format
+SELECT * FROM file('/path/to/data.parquet');
+SELECT * FROM file('sales.csv');
+
+-- Explicit format specification
+SELECT * FROM file('data.csv', 'CSV');
+SELECT * FROM file('logs.json', 'JSONEachRow');
+SELECT * FROM file('export.tsv', 'TSV');
+```
+
+### Remote Data Table Functions
+
+#### **url() Function**
+Access remote data over HTTP/HTTPS:
+```sql
+-- Query CSV from URL
+SELECT * FROM url('https://example.com/data.csv', 'CSV');
+
+-- Query parquet from URL 
+SELECT * FROM url('https://data.example.com/logs/data.parquet');
+```
+
+#### **s3() Function**
+Direct S3 data access:
+```sql
+-- Single S3 file
+SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/aapl_stock.csv', 'CSVWithNames');
+
+-- S3 with credentials and wildcard patterns
+SELECT count() FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/mta/*.tsv', '<KEY>', '<SECRET>','TSVWithNames')
+```
+
+#### **hdfs() Function**
+Hadoop Distributed File System access:
+```sql
+-- HDFS file access
+SELECT * FROM hdfs('hdfs://namenode:9000/data/events.parquet');
+
+-- HDFS directory scan
+SELECT * FROM hdfs('hdfs://cluster/warehouse/table/*', 'TSV');
+```
+
+### Database Table Functions
+
+#### **sqlite() Function**
+Query SQLite databases:
+```sql
+-- Access SQLite table
+SELECT * FROM sqlite('/path/to/database.db', 'users');
+
+-- Join with other data
+SELECT u.name, s.amount 
+FROM sqlite('app.db', 'users') u
+JOIN file('sales.csv') s ON u.id = s.user_id;
+```
+
+#### **postgresql() Function**
+Connect to PostgreSQL:
+```sql
+-- PostgreSQL table access
+SELECT * FROM postgresql('localhost:5432', 'mydb', 'orders', 'user', 'password');
+```
+
+#### **mysql() Function**
+MySQL database integration:
+```sql
+-- MySQL table query
+SELECT * FROM mysql('localhost:3306', 'shop', 'products', 'user', 'password');
+```
+
+## Table Function Best Practices
+
+### **Performance Optimization**
+- **Predicate Pushdown**: Apply filters early to reduce data transfer
+- **Column Pruning**: Select only needed columns
+
+### **Error Handling**
+- Test table function connectivity with `LIMIT 1`
+- Verify data formats match function expectations
+- Use `DESCRIBE` to understand schema before complex queries
+
+## Workflow with Table Functions
+
+1. **Identify Data Source**: Choose appropriate table function
+2. **Test Connection**: Use simple `SELECT * LIMIT 1` queries
+3. **Explore Schema**: Use `DESCRIBE table_function(...)` 
+4. **Build Query**: Combine table functions as needed
+5. **Optimize**: Apply filters and column selection
+
+## Getting Started
+
+When helping users:
+1. **Identify their data source type** and recommend the appropriate table function
+2. **Show table function syntax** with their specific parameters
+3. **Demonstrate data exploration** using the table function
+4. **Build analytical queries** combining multiple table functions if needed
+5. **Optimize performance** through proper filtering and column selection
+
+Remember: chDB's table functions eliminate the need for data loading - you can query data directly from its source, making analytics faster and more flexible.
+"""