Proposal: Add Optional Response Handler System for Custom Processing

## Context

I understand that what I'm proposing may be beyond the original scope of the Polygon MCP server, which excels at providing financial data access to LLMs. However, I have a vision for creating sophisticated "data science agents" using Polygon as the backbone - agents that can autonomously collect, process, and analyze vast amounts of market data without human intervention.

The current MCP pattern works beautifully for simple queries, but hits fundamental limitations when trying to build agents that need to:
- Build comprehensive market databases over time
- Run backtests on years of historical data
- Perform complex statistical analysis across thousands of symbols
- Create and maintain feature stores for ML models
- Generate trading signals from multi-timeframe analysis

These aren't just API access patterns - they're data pipeline patterns that require computation to happen *before* the data reaches the LLM.

## Summary
I'd like to propose adding support for optional callback handlers that LLMs can pass directly in API requests. This would allow LLMs to process large responses (store to database, summarize, etc.) without the data passing through their context window. 

**The key insight**: Let LLMs pass handler code as a parameter - no configuration, no rebuilds, just works. This transforms the Polygon MCP from a data access tool into a data science platform.

## Motivation

Currently, when using the Polygon MCP server with LLMs, all response data passes through the LLM's context window. This creates several challenges:

1. **Context Window Explosion**: Requesting large datasets quickly exceeds limits:
   - **Validated**: 1 day of minute data = 57,044 tokens (exceeds MCP's 25k response limit)
   - 1 month of minute data = ~1.3x Claude's 200k context limit
   - 1 year of minute data = ~15.7x Claude's context limit
   - 5 years of minute data = ~78.6x Claude's context limit
2. **Token Waste**: When storing data for later analysis, the full JSON response burns tokens unnecessarily
3. **No Custom Processing**: Users who want to store data directly to databases (DuckDB, PostgreSQL, etc.) must receive the full response first, then process it separately
4. **Safety Concerns**: No built-in protection against accidentally requesting too much data (confirmed via testing)

## Proposed Solution

Add an optional handler system that allows users to:
- Process responses before they reach the LLM
- Store large datasets directly to databases
- Return summaries instead of full data
- Add safety checks and warnings

### Key Design Principles
- **Fully Backward Compatible**: Works exactly as today if no handlers are specified
- **Non-Invasive**: No changes to existing tool implementations
- **Pluggable**: Users can add handlers without modifying the server
- **Composable**: Multiple handlers can work together

## Implementation Overview

The core idea is simple: let LLMs pass handler code as a parameter that processes data on the server before it reaches the LLM's context window.

### Data Flow

**Current (Problem):**
```
LLM Request → Polygon API → 3M tokens → LLM Context (💥 explodes)
```

**With Handlers (Solution):**
```
┌─────────┐     ┌──────────────┐     ┌─────────┐     ┌───────────────┐     ┌─────────┐
│   LLM   │────▶│ Pre-Handler  │────▶│ Polygon │────▶│ Post-Handler  │────▶│   LLM   │
│ Request │     │  (Optional)  │     │   API   │     │   (Process)   │     │ Summary │
└─────────┘     └──────────────┘     └─────────┘     └───────────────┘     └─────────┘
                       │                                      │
                       ▼                                      ▼
                 - Check cache                          - Store to DuckDB
                 - Validate params                      - Calculate stats
                 - Skip if cached                       - Return summary (not 3M tokens!)
```

## Server Implementation

Minimal change to support handlers - just add optional parameters and execute them with `uv`:

```python
@poly_mcp.tool()
async def list_aggs(
    ticker, timespan, from_, to,  # Original params
    _pre_handler: Optional[str] = None,      # New: pre-request handler
    _response_handler: Optional[str] = None  # New: post-response handler
):
    # Pre-handler can check cache or modify params
    if _pre_handler:
        pre_result = execute_with_uv(_pre_handler, {'params': locals()})
        if pre_result.get('skip_api'):
            return pre_result.get('response')  # Cache hit - skip API entirely!
    
    # Get API response as normal
    response = polygon_client.list_aggs(ticker, timespan, from_, to)
    response_data = json.loads(response.data.decode("utf-8"))
    
    # Post-handler processes response
    if _response_handler:
        return execute_with_uv(_response_handler, {
            'response': response_data,
            'params': locals()
        })
    
    return response_data  # No handler = original behavior
```

Using `uv` provides automatic dependency management, process isolation, and security sandboxing.

## Real-World Testing Results

I validated these issues with actual tests using the Polygon MCP server:

### Test 1: One Day of Minute Data
```python
# Request
await mcp__polygon__get_aggs(
    ticker="NVDA",
    multiplier=1,
    timespan="minute",
    from_="2025-09-09",
    to="2025-09-09"
)

# Result: ERROR
# "MCP tool response (57044 tokens) exceeds maximum allowed tokens (25000)"
```

### Test 2: Limited Request (30 bars)
```python
# Request with limit
await mcp__polygon__get_aggs(
    ticker="NVDA",
    multiplier=1,
    timespan="minute",
    from_="2025-09-09",
    to="2025-09-09",
    limit=30
)

# Result: SUCCESS - Returns 30 bars
# But this doesn't scale for historical data analysis
```

### Token Usage Analysis (Validated)
| Time Period | Minute Bars | Tokens | vs Claude Limit | vs MCP Limit |
|-------------|-------------|--------|-----------------|--------------|
| 1 Day | 390 | 57,044 | 29% | 228% ❌ |
| 1 Week | 1,950 | 285,220 | 143% ❌ | 1,141% ❌ |
| 1 Month | 8,190 | 1,198,000 | 599% ❌ | 4,792% ❌ |
| 1 Year | 98,280 | 14,377,000 | 7,189% ❌ | 57,508% ❌ |


## Example Use Cases

### Cache Example - Skip Expensive API Calls
```python
# LLM can check cache before hitting the API
result = await list_aggs(
    ticker="AAPL",
    timespan="day",
    from_="2024-01-01",
    to="2024-12-31",
    _pre_handler="""
# /// script
# dependencies = ["redis"]
# ///
import redis
import json
import hashlib

# Check if we already have this data cached
r = redis.Redis(host='localhost', port=6379)
cache_key = f"{params['ticker']}:{params['from_']}:{params['to']}"
cached = r.get(cache_key)

if cached:
    # Skip API call entirely!
    result = {"skip_api": True, "response": json.loads(cached)}
else:
    result = {"skip_api": False}
    """,
    _response_handler="""
# Store response in cache for next time
import redis
r = redis.Redis(host='localhost', port=6379)
cache_key = f"{params['ticker']}:{params['from_']}:{params['to']}"
r.setex(cache_key, 3600, json.dumps(response))  # Cache for 1 hour
result = response  # Pass through
    """
)
```

### LLM-Generated Storage Handler
```python
# User asks: "Store NVDA 2024 minute data in my database"
# LLM generates this complete request with custom handler:

result = await list_aggs(
    ticker="NVDA",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _response_handler="""
# /// script
# dependencies = ["duckdb>=1.0.0", "pandas>=2.0.0"]
# ///
import duckdb
import pandas as pd

# LLM wrote this handler to fulfill user's request
conn = duckdb.connect('/data/market.db')
df = pd.DataFrame(response['results'])

# Store with the exact table name user might want
conn.execute(f"CREATE TABLE nvda_2024_minute AS SELECT * FROM df")

# Return summary so LLM knows what happened
result = {
    "success": True,
    "records_stored": len(response['results']),
    "database": "/data/market.db",
    "table": "nvda_2024_minute",
    "sample": response['results'][:5],  # Small sample for verification
    "stats": {
        "high": df['h'].max(),
        "low": df['l'].min(),
        "avg_volume": df['v'].mean()
    }
}
    """
)

# LLM receives just the summary (not 3M tokens!)
# and can tell user: "I've stored 98,280 minute bars for NVDA in your database"
```


## How Handlers Work

Taking inspiration from Claude Code hooks and uv's inline script metadata (PEP 723), handlers are self-contained scripts with their own dependencies:

1. **LLM generates handler** with inline dependencies (PEP 723 format)
2. **Server executes handler** in isolated uv environment  
3. **Handler processes data** and returns summary
4. **LLM receives summary** instead of millions of tokens

### Using Handler Files

For reusable handlers stored in files, the LLM would:
```python
# Step 1: Read the handler file
with open('~/.polygon_handlers/store_to_duckdb.py') as f:
    handler_code = f.read()

# Step 2: Pass the contents to the API
result = await list_aggs(
    ticker="AAPL",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _response_handler=handler_code  # Pass file contents as string
)
```

Note: The MCP server only accepts handler code as strings, not file paths (for security reasons).

### Alternative: Pre-Built Templates
```python
# For organizations that don't allow custom code execution,
# the server could provide pre-built, validated templates:

result = await list_aggs(
    ticker="AAPL",
    timespan="minute",
    from_="2024-01-01",
    to="2024-12-31",
    _handler="duckdb",  # Name of pre-built template (not custom code)
    _handler_config={    # Configuration for the template
        "path": "/data/market.db",
        "table": "aapl_2024"
    }
)

# This doesn't execute arbitrary code - just uses a pre-validated
# server-side function with the provided configuration
```

## Security Considerations

I recommend supporting **both** approaches with progressive trust levels:

### Level 1: Pre-Built Templates (Default)
- **No code execution** - Only parameterized templates
- **Use case**: Production, enterprise environments
- **Example**: `_handler="duckdb"` with `_handler_config={"path": "...", "table": "..."}`

### Level 2: Custom Scripts (Opt-in)
- **Requires**: `POLYGON_ALLOW_CUSTOM_HANDLERS=true`
- **Security**: Process isolation via uv, resource limits, filesystem restrictions
- **Use case**: Development, personal use, trusted LLMs
- **Example**: `_response_handler="...handler code..."`

### Why Both?

| Aspect | Templates | uv Scripts |
|--------|-----------|------------|
| **Security** | ✅ Highest - No code execution | ⚠️ Medium - Sandboxed execution |
| **Flexibility** | Limited to predefined operations | Unlimited Python capabilities |
| **Performance** | Fast - Direct function calls | Slower - Process spawning |
| **Dependencies** | None - Uses server's packages | Any - Inline dependencies |
| **User Skill** | No coding required | Python knowledge needed |
| **LLM Usage** | Safe for any LLM | Requires trusted LLM |

**Templates handle 80% of use cases** (store to DB, summarize, cache) while **scripts enable advanced workflows** (custom analytics, complex transformations, multi-step pipelines).

## Questions for Discussion

1. Would you be interested in supporting this handler pattern?
2. Should we start with just templates, or include custom scripts from day one?
3. Any concerns about the security model for custom handlers?

## Next Steps

If there's interest, I'm happy to create a proof-of-concept PR demonstrating the approach.

---

Looking forward to your thoughts! Happy to discuss implementation details or adjust the approach based on your feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Add Optional Response Handler System for Custom Processing #27

Context

Summary

Motivation

Proposed Solution

Key Design Principles

Implementation Overview

Data Flow

Server Implementation

Real-World Testing Results

Test 1: One Day of Minute Data

Test 2: Limited Request (30 bars)

Token Usage Analysis (Validated)

Example Use Cases

Cache Example - Skip Expensive API Calls

LLM-Generated Storage Handler

How Handlers Work

Using Handler Files

Alternative: Pre-Built Templates

Security Considerations

Level 1: Pre-Built Templates (Default)

Level 2: Custom Scripts (Opt-in)

Why Both?

Questions for Discussion

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time Period	Minute Bars	Tokens	vs Claude Limit	vs MCP Limit
1 Day	390	57,044	29%	228% ❌
1 Week	1,950	285,220	143% ❌	1,141% ❌
1 Month	8,190	1,198,000	599% ❌	4,792% ❌
1 Year	98,280	14,377,000	7,189% ❌	57,508% ❌

Aspect	Templates	uv Scripts
Security	✅ Highest - No code execution	⚠️ Medium - Sandboxed execution
Flexibility	Limited to predefined operations	Unlimited Python capabilities
Performance	Fast - Direct function calls	Slower - Process spawning
Dependencies	None - Uses server's packages	Any - Inline dependencies
User Skill	No coding required	Python knowledge needed
LLM Usage	Safe for any LLM	Requires trusted LLM

Proposal: Add Optional Response Handler System for Custom Processing #27

Description

Context

Summary

Motivation

Proposed Solution

Key Design Principles

Implementation Overview

Data Flow

Server Implementation

Real-World Testing Results

Test 1: One Day of Minute Data

Test 2: Limited Request (30 bars)

Token Usage Analysis (Validated)

Example Use Cases

Cache Example - Skip Expensive API Calls

LLM-Generated Storage Handler

How Handlers Work

Using Handler Files

Alternative: Pre-Built Templates

Security Considerations

Level 1: Pre-Built Templates (Default)

Level 2: Custom Scripts (Opt-in)

Why Both?

Questions for Discussion

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions