DEVAI-346: Add hive query tool #105

mickeey2525 · 2025-09-07T03:25:50Z

currently only Trino is supported.

I'd like to have LLM use hive for a case like rewrite Trino query to Hive query.
With this feature, we can rely on LLM to rewriting as it can execute the query and check if it will work or not.

codecov-commenter · 2025-09-07T03:26:31Z

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Copilot

Pull Request Overview

This PR adds Hive query execution support to the MCP server, enabling LLMs to rewrite and validate Trino queries against Hive. The implementation provides four new Hive-specific tools that complement the existing Trino functionality.

Implements a complete Hive client using Treasure Data's v3 REST API
Adds four Hive tools: query execution, job management, and result retrieval
Integrates existing security validation and audit logging systems

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/client/hive.ts`	New TDHiveClient implementing REST API communication with job management and query execution
`src/client/tdapi/endpoints.ts`	Centralized TD API endpoint configuration for different sites
`src/tools/hive/*.ts`	Four new Hive tools providing query, execute, job-status, and job-result functionality
`src/server.ts`	Integration of Hive tools into the MCP server's tool registry and handlers
`src/security/audit-logger.ts`	Fixed console logging to use stderr instead of stdout for MCP compatibility
`tests/*/.test.ts`	Comprehensive test coverage for all new Hive functionality

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/security/audit-logger.ts

src/client/hive.ts

src/tools/hive/execute.ts

… AuditLogger

Copilot

Pull Request Overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

xerial

PR Review Summary

Thank you for adding Hive query functionality! The implementation is comprehensive with good security practices and test coverage. Below are my findings:

✅ Strengths:

Comprehensive implementation with query, execute, job status, and result operations
Good test coverage with proper mocking
Secure by design with SQL validation and audit logging
Proper API key masking in errors
Well-structured code organization under src/tools/hive/

🔴 Critical Issues to Fix:

Unrelated change in audit-logger.ts - Revert the console.log → console.error change
Missing documentation - Add Hive integration details to CLAUDE.md

🟡 Recommendations:

Consider refactoring to class-based pattern for consistency with existing Presto tools
Add error logging in getJobErrorDetails for debugging
Consider adding integration tests (can be follow-up PR)

I'll add specific inline comments for the issues found.

xerial · 2025-09-26T20:25:22Z

src/security/audit-logger.ts

    const rowCount = entry.rowCount !== undefined ? ` -> ${entry.rowCount} rows` : '';

-    console.log(
+    console.error(


🔴 Critical Issue: This change from console.log to console.error is unrelated to the Hive feature and breaks consistency with the existing audit logging pattern. Please revert this change.

xerial · 2025-09-26T20:27:21Z

src/server.ts

            properties: {},
          },
        },
+        // Hive Tools


📝 Documentation Needed: This PR should also update README/CLAUDE.md to document the new Hive integration, similar to the existing CDP and Workflow sections. The documentation should include:

Overview of Hive integration and when to use it vs Presto

Available tools and their usage

Authentication details

Example usage patterns

xerial · 2025-09-26T20:29:12Z

src/client/hive.ts

+  private async getJobErrorDetails(jobId: string): Promise<string | undefined> {
+    try {
+      const show = await this.request<JobShowResponse>(
+        'GET',


🟡 Suggestion: Consider adding debug logging here when errors occur while fetching job details. This would help with troubleshooting issues in production:

} catch (e) { if (process.env.TD_MCP_LOG_TO_CONSOLE === 'true') { console.error('[Hive] Failed to get job error details:', e); } return undefined; }

xerial · 2025-09-26T20:33:58Z

src/tools/hive/query.ts

+  inputSchema: {
+    type: 'object',
+    properties: {
+      sql: { type: 'string', description: 'Hive SQL query (SELECT, SHOW, DESCRIBE only).' },


Let's note the use of Hive SQL dialect and Treasure Data UDFs, so that LLM can generate TD-Hive-specific queries.

xerial · 2025-09-26T20:34:22Z

src/tools/hive/query.ts

+  limit: z.number().int().min(1).max(10000).optional().describe('Max rows to return (default 40)')
+});
+
+export const hiveQuery = {


🟡 Architecture Consideration: The Hive tools export simple objects with handler functions, while the existing Presto QueryTool uses a class-based approach. Consider refactoring to match the class pattern for better consistency with the existing codebase architecture.

However, I notice the CDP and Workflow tools also use the simple object pattern, so this may be acceptable. Just worth noting the inconsistency between different tool types.

^ Is a review from Claude Code.

I think using top-level function is better for TypeScript, so no need to use class here. We need to revise existing code to use top-level functions

xerial · 2025-09-26T20:36:05Z

src/tools/hive/query.ts

+
+    const config = loadConfig();
+    const client = new TDHiveClient({ ...config, database: database || config.database });
+    const validator = new QueryValidator(false); // disallow writes regardless of env flag for this tool


✅ Good Security Practice: Excellent implementation here - hive_query hardcodes read-only validation regardless of the config flag, ensuring this tool can never perform write operations. This is a solid security pattern.

xerial · 2025-09-26T20:36:18Z

tests/security/audit-logger.test.ts


-      expect(consoleLogSpy).toHaveBeenCalled();
-      const logMessage = consoleLogSpy.mock.calls[0][0] as string;
+      expect(consoleErrorSpy).toHaveBeenCalled();


⚠️ Test Change: These test modifications appear to be unrelated to the Hive feature. The changes from consoleLogSpy to consoleErrorSpy should be reverted along with the audit-logger.ts change to maintain consistency.

xerial · 2025-09-26T20:42:29Z

src/tools/hive/query.ts

+  handler: async (args: unknown) => {
+    const { sql, database, limit } = inputSchema.parse(args);
+
+    const config = loadConfig();


🔴 Database Context Pattern Issue: The database parameter handling here doesn't follow the established pattern used by Trino tools.

Current implementation creates a new client with the database baked into the config:

const client = new TDHiveClient({ ...config, database: database || config.database });

This should follow the Trino pattern of using a useDatabase context method instead:

const client = new TDHiveClient(config); if (database) { client.useDatabase(database); }

This inconsistency will confuse LLMs about how to properly set database context across different tools.

xerial · 2025-09-26T20:42:40Z

src/tools/hive/execute.ts

+  },
+  handler: async (args: unknown): Promise<HiveExecuteWriteResult | HiveQueryResult> => {
+    const { sql, database, priority, retry_limit, pool_name, limit } = inputSchema.parse(args);
+


🔴 Same Database Pattern Issue: This also creates a new client with database in config rather than using a useDatabase pattern. Should be consistent with Trino tools.

xerial · 2025-09-26T20:43:36Z

src/client/hive.ts

+
+/**
+ * Minimal REST client for Treasure Data Hive jobs (v3 API)
+ */


🔴 Missing useDatabase Method: The TDHiveClient should implement a useDatabase method to match the pattern used by TDTrinoClient. This would ensure consistent database context handling across all query tools.

Suggested implementation:

export class TDHiveClient { private readonly apiKey: string; private readonly baseUrl: string; private currentDatabase: string; // Add this readonly database: string; constructor(config: Config) { this.apiKey = config.td_api_key; this.baseUrl = getTdApiEndpointForSite(config.site); this.database = config.database || 'sample_datasets'; this.currentDatabase = this.database; // Initialize } // Add this method useDatabase(database: string): this { this.currentDatabase = database; return this; } async issueHive(query: string, options?: HiveIssueJobOptions): Promise<{ job_id: string }> { // Use this.currentDatabase instead of database parameter const db = this.currentDatabase; // ... rest of implementation } }

This would allow tools to use: client.useDatabase(database) instead of passing database through config, matching the Trino pattern.

xerial · 2025-09-26T20:44:09Z

src/tools/hive/index.ts

@@ -0,0 +1,5 @@
+export { hiveQuery } from './query.js';


📝 LLM Usage Complexity: The async job model of Hive (submit → poll → fetch results) is significantly more complex than Trino's synchronous model. This PR should add clear documentation in CLAUDE.md/README.md explaining:

When to use each tool:

hive_query: For simple SELECT queries (auto-manages job lifecycle)

hive_execute: For write operations or when you need the job_id

hive_job_status + hive_job_result: For long-running jobs or recovering results

IMPORTANT These differences should be noted in the function description so that LLM can understand the method differences.

Key differences from Trino:

Trino: Synchronous, immediate results

Hive: Asynchronous jobs, may take minutes/hours

Usage patterns:

// Simple query hive_query(sql, database) // Long-running job job_id = hive_execute(sql, database) status = hive_job_status(job_id) // Poll until complete results = hive_job_result(job_id) // Fetch when ready

Without proper documentation, LLMs will struggle to choose the right tool and handle the async nature correctly.

add hive query tool

d101c6d

chrisjaure-td requested a review from Copilot September 9, 2025 15:02

Copilot AI reviewed Sep 9, 2025

View reviewed changes

src/security/audit-logger.ts Outdated Show resolved Hide resolved

src/client/hive.ts Outdated Show resolved Hide resolved

src/tools/hive/execute.ts Outdated Show resolved Hide resolved

refactor: Improve code formatting and readability in TDHiveClient and…

b565204

… AuditLogger

mickeey2525 requested a review from Copilot September 18, 2025 09:05

Copilot AI reviewed Sep 18, 2025

View reviewed changes

yuokada requested a review from xerial September 25, 2025 09:00

xerial changed the title ~~add hive query tool~~ DEVAI-346: Add hive query tool Sep 26, 2025

xerial reviewed Sep 26, 2025

View reviewed changes

DEVAI-346: Add hive query tool #105

Are you sure you want to change the base?

DEVAI-346: Add hive query tool #105

Uh oh!

Conversation

mickeey2525 commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 7, 2025

Welcome to Codecov 🎉

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

xerial left a comment

Choose a reason for hiding this comment

PR Review Summary

✅ Strengths:

🔴 Critical Issues to Fix:

🟡 Recommendations:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xerial Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xerial Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xerial Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mickeey2525 commented Sep 7, 2025 •

edited

Loading

xerial Sep 26, 2025 •

edited

Loading

xerial Sep 26, 2025 •

edited

Loading

xerial Sep 26, 2025 •

edited

Loading