Skip to content

Conversation

@ForeverAngry
Copy link
Contributor

Closes #2147

  • Implement ResidualEvaluatorCache with LRU eviction and thread safety
  • Cache evaluators by partition spec, expression, case sensitivity, and schema

Rationale for this change

For queries, the same combinations of (partition spec, expression, case sensitivity, schema) are evaluated repeatedly, causing unnecessary computational overhead and increased query latency.

Are these changes tested?

Yes, using the existing test cases as they seemed sufficient for the changes.

Are there any user-facing changes?

No.

Fixes apache#2147

- Implement ResidualEvaluatorCache with LRU eviction and thread safety
- Cache evaluators by partition spec, expression, case sensitivity, and schema
- Fix mypy type annotations and add type ignore for cachetools decorator
@ForeverAngry ForeverAngry force-pushed the feat-cache-residual-evaluator branch from 55e53d0 to 0ec381e Compare November 7, 2025 14:41
Comment on lines +2084 to +2086
cached = _residual_evaluator_cache.get(spec, expr, case_sensitive, schema)
if cached is not None:
return cached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@ForeverAngry ForeverAngry Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Fokko 👋 ! We certainly could try to simplify this — my main motivation for the current approach was to build something that felt more extensible, more readable (so contributors don’t have to dig too deep to understand what’s going on), and that gives users a bit more control.

That said, when I considered switching to functools.lru_cache for residual evaluator creation, it looked like it would actually require more work to avoid a custom cache. Specifically, making the cache keys hashable while still passing the spec, expr, and schema objects — which aren’t inherently hashable — seemed tricky. I would have needed helper functions to produce stable key representations, and potentially to check the hashability of PartitionSpec, just to ensure caching didn’t break evaluator construction.

I’m definitely open to simplifying this if you think that’s the right direction. Let me know your thoughts!

@ForeverAngry ForeverAngry requested a review from Fokko November 23, 2025 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cache ResidualEvaluator

2 participants