Skip to content

Proposal: Add Optional Response Handler System for Custom Processing #27

@JKapostins

Description

@JKapostins

Context

I understand that what I'm proposing may be beyond the original scope of the Polygon MCP server, which excels at providing financial data access to LLMs. However, I have a vision for creating sophisticated "data science agents" using Polygon as the backbone - agents that can autonomously collect, process, and analyze vast amounts of market data without human intervention.

The current MCP pattern works beautifully for simple queries, but hits fundamental limitations when trying to build agents that need to:

  • Build comprehensive market databases over time
  • Run backtests on years of historical data
  • Perform complex statistical analysis across thousands of symbols
  • Create and maintain feature stores for ML models
  • Generate trading signals from multi-timeframe analysis

These aren't just API access patterns - they're data pipeline patterns that require computation to happen before the data reaches the LLM.

Summary

I'd like to propose adding support for optional callback handlers that LLMs can pass directly in API requests. This would allow LLMs to process large responses (store to database, summarize, etc.) without the data passing through their context window.

The key insight: Let LLMs pass handler code as a parameter - no configuration, no rebuilds, just works. This transforms the Polygon MCP from a data access tool into a data science platform.

Motivation

Currently, when using the Polygon MCP server with LLMs, all response data passes through the LLM's context window. This creates several challenges:

  1. Context Window Explosion: Requesting large datasets quickly exceeds limits:
    • Validated: 1 day of minute data = 57,044 tokens (exceeds MCP's 25k response limit)
    • 1 month of minute data = ~1.3x Claude's 200k context limit
    • 1 year of minute data = ~15.7x Claude's context limit
    • 5 years of minute data = ~78.6x Claude's context limit
  2. Token Waste: When storing data for later analysis, the full JSON response burns tokens unnecessarily
  3. No Custom Processing: Users who want to store data directly to databases (DuckDB, PostgreSQL, etc.) must receive the full response first, then process it separately
  4. Safety Concerns: No built-in protection against accidentally requesting too much data (confirmed via testing)

Proposed Solution

Add an optional handler system that allows users to:

  • Process responses before they reach the LLM
  • Store large datasets directly to databases
  • Return summaries instead of full data
  • Add safety checks and warnings

Key Design Principles

  • Fully Backward Compatible: Works exactly as today if no handlers are specified
  • Non-Invasive: No changes to existing tool implementations
  • Pluggable: Users can add handlers without modifying the server
  • Composable: Multiple handlers can work together

Implementation Overview

The core idea is simple: let LLMs pass handler code as a parameter that processes data on the server before it reaches the LLM's context window.

Data Flow

Current (Problem):

LLM Request → Polygon API → 3M tokens → LLM Context (💥 explodes)

With Handlers (Solution):

┌─────────┐     ┌──────────────┐     ┌─────────┐     ┌───────────────┐     ┌─────────┐
│   LLM   │────▶│ Pre-Handler  │────▶│ Polygon │────▶│ Post-Handler  │────▶│   LLM   │
│ Request │     │  (Optional)  │     │   API   │     │   (Process)   │     │ Summary │
└─────────┘     └──────────────┘     └─────────┘     └───────────────┘     └─────────┘
                       │                                      │
                       ▼                                      ▼
                 - Check cache                          - Store to DuckDB
                 - Validate params                      - Calculate stats
                 - Skip if cached                       - Return summary (not 3M tokens!)

Server Implementation

Minimal change to support handlers - just add optional parameters and execute them with uv:

@poly_mcp.tool()
async def list_aggs(
    ticker, timespan, from_, to,  # Original params
    _pre_handler: Optional[str] = None,      # New: pre-request handler
    _response_handler: Optional[str] = None  # New: post-response handler
):
    # Pre-handler can check cache or modify params
    if _pre_handler:
        pre_result = execute_with_uv(_pre_handler, {'params': locals()})
        if pre_result.get('skip_api'):
            return pre_result.get('response')  # Cache hit - skip API entirely!
    
    # Get API response as normal
    response = polygon_client.list_aggs(ticker, timespan, from_, to)
    response_data = json.loads(response.data.decode("utf-8"))
    
    # Post-handler processes response
    if _response_handler:
        return execute_with_uv(_response_handler, {
            'response': response_data,
            'params': locals()
        })
    
    return response_data  # No handler = original behavior

Using uv provides automatic dependency management, process isolation, and security sandboxing.

Real-World Testing Results

I validated these issues with actual tests using the Polygon MCP server:

Test 1: One Day of Minute Data

# Request
await mcp__polygon__get_aggs(
    ticker="NVDA",
    multiplier=1,
    timespan="minute",
    from_="2025-09-09",
    to="2025-09-09"
)

# Result: ERROR
# "MCP tool response (57044 tokens) exceeds maximum allowed tokens (25000)"

Test 2: Limited Request (30 bars)

# Request with limit
await mcp__polygon__get_aggs(
    ticker="NVDA",
    multiplier=1,
    timespan="minute",
    from_="2025-09-09",
    to="2025-09-09",
    limit=30
)

# Result: SUCCESS - Returns 30 bars
# But this doesn't scale for historical data analysis

Token Usage Analysis (Validated)

Time Period Minute Bars Tokens vs Claude Limit vs MCP Limit
1 Day 390 57,044 29% 228% ❌
1 Week 1,950 285,220 143% ❌ 1,141% ❌
1 Month 8,190 1,198,000 599% ❌ 4,792% ❌
1 Year 98,280 14,377,000 7,189% ❌ 57,508% ❌

Example Use Cases

Cache Example - Skip Expensive API Calls

# LLM can check cache before hitting the API
result = await list_aggs(
    ticker="AAPL",
    timespan="day",
    from_="2024-01-01",
    to="2024-12-31",
    _pre_handler="""
# /// script
# dependencies = ["redis"]
# ///
import redis
import json
import hashlib

# Check if we already have this data cached
r = redis.Redis(host='localhost', port=6379)
cache_key = f"{params['ticker']}:{params['from_']}:{params['to']}"
cached = r.get(cache_key)

if cached:
    # Skip API call entirely!
    result = {"skip_api": True, "response": json.loads(cached)}
else:
    result = {"skip_api": False}
    """,
    _response_handler="""
# Store response in cache for next time
import redis
r = redis.Redis(host='localhost', port=6379)
cache_key = f"{params['ticker']}:{params['from_']}:{params['to']}"
r.setex(cache_key, 3600, json.dumps(response))  # Cache for 1 hour
result = response  # Pass through
    """
)

LLM-Generated Storage Handler

# User asks: "Store NVDA 2024 minute data in my database"
# LLM generates this complete request with custom handler:

result = await list_aggs(
    ticker="NVDA",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _response_handler="""
# /// script
# dependencies = ["duckdb>=1.0.0", "pandas>=2.0.0"]
# ///
import duckdb
import pandas as pd

# LLM wrote this handler to fulfill user's request
conn = duckdb.connect('/data/market.db')
df = pd.DataFrame(response['results'])

# Store with the exact table name user might want
conn.execute(f"CREATE TABLE nvda_2024_minute AS SELECT * FROM df")

# Return summary so LLM knows what happened
result = {
    "success": True,
    "records_stored": len(response['results']),
    "database": "/data/market.db",
    "table": "nvda_2024_minute",
    "sample": response['results'][:5],  # Small sample for verification
    "stats": {
        "high": df['h'].max(),
        "low": df['l'].min(),
        "avg_volume": df['v'].mean()
    }
}
    """
)

# LLM receives just the summary (not 3M tokens!)
# and can tell user: "I've stored 98,280 minute bars for NVDA in your database"

How Handlers Work

Taking inspiration from Claude Code hooks and uv's inline script metadata (PEP 723), handlers are self-contained scripts with their own dependencies:

  1. LLM generates handler with inline dependencies (PEP 723 format)
  2. Server executes handler in isolated uv environment
  3. Handler processes data and returns summary
  4. LLM receives summary instead of millions of tokens

Using Handler Files

For reusable handlers stored in files, the LLM would:

# Step 1: Read the handler file
with open('~/.polygon_handlers/store_to_duckdb.py') as f:
    handler_code = f.read()

# Step 2: Pass the contents to the API
result = await list_aggs(
    ticker="AAPL",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _response_handler=handler_code  # Pass file contents as string
)

Note: The MCP server only accepts handler code as strings, not file paths (for security reasons).

Alternative: Pre-Built Templates

# For organizations that don't allow custom code execution,
# the server could provide pre-built, validated templates:

result = await list_aggs(
    ticker="AAPL",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _handler="duckdb",  # Name of pre-built template (not custom code)
    _handler_config={    # Configuration for the template
        "path": "/data/market.db",
        "table": "aapl_2024"
    }
)

# This doesn't execute arbitrary code - just uses a pre-validated
# server-side function with the provided configuration

Security Considerations

I recommend supporting both approaches with progressive trust levels:

Level 1: Pre-Built Templates (Default)

  • No code execution - Only parameterized templates
  • Use case: Production, enterprise environments
  • Example: _handler="duckdb" with _handler_config={"path": "...", "table": "..."}

Level 2: Custom Scripts (Opt-in)

  • Requires: POLYGON_ALLOW_CUSTOM_HANDLERS=true
  • Security: Process isolation via uv, resource limits, filesystem restrictions
  • Use case: Development, personal use, trusted LLMs
  • Example: _response_handler="...handler code..."

Why Both?

Aspect Templates uv Scripts
Security ✅ Highest - No code execution ⚠️ Medium - Sandboxed execution
Flexibility Limited to predefined operations Unlimited Python capabilities
Performance Fast - Direct function calls Slower - Process spawning
Dependencies None - Uses server's packages Any - Inline dependencies
User Skill No coding required Python knowledge needed
LLM Usage Safe for any LLM Requires trusted LLM

Templates handle 80% of use cases (store to DB, summarize, cache) while scripts enable advanced workflows (custom analytics, complex transformations, multi-step pipelines).

Questions for Discussion

  1. Would you be interested in supporting this handler pattern?
  2. Should we start with just templates, or include custom scripts from day one?
  3. Any concerns about the security model for custom handlers?

Next Steps

If there's interest, I'm happy to create a proof-of-concept PR demonstrating the approach.


Looking forward to your thoughts! Happy to discuss implementation details or adjust the approach based on your feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions