Skip to content

Conversation

tdcmeehan
Copy link
Contributor

No description provided.

@prestodb-ci prestodb-ci added the from:IBM PRs from IBM label Aug 14, 2025
@tdcmeehan tdcmeehan force-pushed the mv branch 3 times, most recently from b3e4930 to 26b27c5 Compare August 14, 2025 03:35
@tdcmeehan tdcmeehan marked this pull request as ready for review August 14, 2025 15:30
@prestodb-ci prestodb-ci requested review from a team, Mariamalmesfer and pramodsatya and removed request for a team August 14, 2025 15:30
Copy link

@jainxrohit jainxrohit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Thanks for this RFC. Meta is also planning to resume the work in the materialized views. Back in the days, I had initially tried to build this initially in the planning phase; however, that led to really complex and messy structures, which were very difficult to work with. We probably need even higher level abstraction. Let's align with @arhimondr, @kaikalur and @zation99 on this.

@kaikalur
Copy link

@tdcmeehan Thanks for this RFC. Meta is also planning to resume the work in the materialized views. Back in the days, I had initially tried to build this initially in the planning phase; however, that led to really complex and messy structures, which were very difficult to work with. We probably need even higher level abstraction. Let's align with @arhimondr, @kaikalur and @zation99 on this.

Two issues

  • First one is to come up with a good definition of the view
  • Overhead in the view maintenance

so we need to do some cost benefit analysis. Best might be to keep it narrowly scoped like a "preagg a single table" view for aggregation only queries to start with. Once that stabilizes, we can extend it to other shapes.

@tdcmeehan
Copy link
Contributor Author

Thanks for the comments @jainxrohit and @kaikalur.

@kaikalur to be clear, this design allows for your recommendation of progressive refinement. This design's default behavior is to be rather simple, exactly as you recommended--single table materialized views with simple stale/fresh checks. Progressive enhancements, such as fine grained refresh or unioning up to date partitions with more up to date data, are optional and can be implemented by connectors as needed.

@jainxrohit thanks for the background. I believe this design abstracts complexity and allows for progressive enhancement. I'm keen to maintain momentum on the implementation. Please let me know if you need any additional information or clarification to speed up the review.

@jainxrohit
Copy link

Thanks for the comments @jainxrohit and @kaikalur.

@kaikalur to be clear, this design allows for your recommendation of progressive refinement. This design's default behavior is to be rather simple, exactly as you recommended--single table materialized views with simple stale/fresh checks. Progressive enhancements, such as fine grained refresh or unioning up to date partitions with more up to date data, are optional and can be implemented by connectors as needed.

@jainxrohit thanks for the background. I believe this design abstracts complexity and allows for progressive enhancement. I'm keen to maintain momentum on the implementation. Please let me know if you need any additional information or clarification to speed up the review.

I think the main detail I need to understand better is the proposal to move materialized view optimization from the analysis phase to the planning phase. I believe doing this in the planning phase, especially for the use cases Meta wants to support, is really complex. Should we set up some time to discuss the details further?

@tdcmeehan
Copy link
Contributor Author

@jainxrohit I'll be happy to brainstorm how this might work for your internal infrastructure.


* Backward compatibility with existing Hive MV implementation (it's undocumented and unused)
* Supporting materialized views across different connectors (cross-connector MVs)
* Automatic view selection/routing for arbitrary queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While query rewriting is mentioned as a non-goal, using planning-phase plans for MV matching / query rewriting would be far better than the AST based rewriter we have now.

This is how Calcite (and therefore Hive) does MV rewrites - https://calcite.apache.org/docs/materialized_views.html as well e.g

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify what makes plan-phase MV matching/rewriting far better? Although Calcite implements it in planning, they are still rule based matching and seems nothing specific from planning phase is utilized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In calcite they do have mechanisms to compare cost of plans to select the best mview

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any cost based decision would need to be done in the planner, and there are numerous options: cost based refresh strategy (incremental vs. full), cost based stitching determination, and cost based view replacement. We don't need to implement any of this now, but keeping this rewriting in the analysis phase makes starting those projects harder, because they become not only about implementing the cost based decisions, but also rewriting the the stitching and view replacement code. We want to have an excellent and robust materialized view support in Presto, which means intelligent decisions on when to materialize and what views to select, which means moving to the planner is a worthwhile endeavor.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea of Calcite and the paper they built upon is to have the candidates built with rule based matching that can be done either in analysis or planning, and then have a cost based analysis in planning to finally choose the one or have further optimizations. In Presto, the rule based matching to generate candidates can be easier done in analysis though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In calcite they do have mechanisms to compare cost of plans to select the best mview

That's right. Comparing can be done with cost-based estimation in planning. But the candidates to compare is easier to generate in analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zation99

In Presto, the rule based matching to generate candidates can be easier done in analysis though.

In Presto, all the rule based plan matching happens in the planner (i.e in the PlanOptimizer rules), not in the analysis phase.

Can you clarify, with an example, what do you mean by candidate generation and the rules to generate these candidates ?

Copy link

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan : Did a read through this RFC and its a heavy read. I have some early comments, but will likely have more as I absorb this material.

Are you planning to do a talk through of this RFC in one of the working groups ? Might be good to get consensus.

Copy link
Member

@jaystarshot jaystarshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

### 4. Query Rewriting
- Simple pattern matching to detect if a query can use an MV
- Limited to exact matches or simple projections
- No cost-based selection between multiple MVs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also a limitation of the current design since the analyzer is still doing the replacement

Copy link
Member

@jaystarshot jaystarshot Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can in future do something like the MviewReference(CTEReference --> LogicalCte) or drop it approach. i.e the analyzer creates candidates and the planner chooses them somehow. Could also help with fallback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that this is a limitation also of this design, but the current solution makes it impossible to add cost-based selection the future, whereas this design allows for this future enhancement, perhaps along the lines of what you mention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the problem remains when we would do "view replacement", we would need to create some canonicalized AST to replace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the new design will handle this better, or at least I can see a high level path on how we can get that to work. Connectors return a list of possible replacement candidates via the existing ConnectorMetadata#getReferencedMaterializedViews for a given table during analysis. We put this in the MaterializedViewNode AST, and plan each of all of the candidates. At this point I believe we have all of the tools necessary to leverage some of the techniques that have been written about, for example the technique Calcite uses.

@prestodb prestodb deleted a comment from zation99 Sep 25, 2025
@tdcmeehan tdcmeehan force-pushed the mv branch 2 times, most recently from bcdc3b6 to b5f163e Compare September 30, 2025 19:03
@tdcmeehan
Copy link
Contributor Author

tdcmeehan commented Sep 30, 2025

@jainxrohit @zation99 @arhimondr @aditi-pandit @jaystarshot @hantangwangd @aaneja

Thanks for all your previous reviews and discussions. One of the core assumptions in my design before was that the existing materialized view framework was unused, and so I had creative liberty to redesign it without a significant impact to existing users. With the revelation that Meta will be entering production soon with the existing framework, this means that no longer holds true, and I've updated the design to emphasize continuity and provide a reasonable migration path. Sharing stitching and refresh logic between the open source Iceberg connector and Meta's vast internal infrastructure will be a huge advantage to the open source project.

The RFC now primarily proposes: 1) Refactoring the stitching to be reliant on predicates provided by the connector outside of the table handle (table handle provided staleness still works fine, but open source connectors need to do expensive metadata calls to determine staleness, and this works better as a separate call outside of analysis). 2) A new type of predicate is added which can support time travel based stitching in exclusion to partition based stitching (optional, only if the connector supports that feature). 3) Support for automatic REFRESH (using the same predicates used for stitching).

Please take a look and let me know if you have any feedback.

@tdcmeehan tdcmeehan force-pushed the mv branch 9 times, most recently from 19f370a to 15d316b Compare October 1, 2025 18:30
The existing materialized views implementation is located primarily in the presto-hive module and follows this flow:

### 1. Metadata Storage
- MVs are stored as Hive tables with special properties

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : This metadata part is unchanged in the RFC. Maybe just say that explicitly.

#### Query-Time Analysis (SELECT from MV)
When querying a materialized view, the system:
- Recognizes the table as an MV during name resolution
- Calculates staleness by checking partition modification times

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this achieved ? The analysis doesn't really know about partition key, unless it is added as part of the metadata. Also, how do we obtain partition modification time ? Can we call out to the connector from analysis itself ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's piped through the predicateDisjuncts which are just TupleDomains on the MaterializedViewStatus object, which is called during analysis. This check is probably reasonably lightweight for Hive, but for Iceberg this will be more sophisticated and more complicated, and also more resource intensive since it involves reading the Iceberg metadata from the filesystem. In my proposal this call to retrieve the MaterializedViewStatus happens later, after queuing, but they involve similar predicates.

![Stitching Flow Comparison](./RFC-0013/current-vs-rfc-stitching.png)

New AST Node: MaterializedViewScan
- Left side: Data table scan (storage table)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHy is this left side and right side ? Makes it sound like a join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this as left and right because it's a tree in the AST. The left is just a leaf node representing the base table scan, the right is a subtree representing the definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the language more explicit.

New AST Node: MaterializedViewScan
- Left side: Data table scan (storage table)
- Right side: View query structure
- MV metadata for planning decisions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know some fields for this already ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be the the MaterializedViewDefinition, updated.

- MV metadata and symbol mappings for optimization
- Standard `PlanNode` that integrates with existing infrastructure

Optimization Phase: MaterializedViewOptimizer Rule (Default Implementation)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this rule positioned in the planner rules application ? Closer to the end of the rules list ? If there are predicates etc we want them to be pushed down as much as possible before choosing a MV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picture this being early during optimization, so that all of our usual predicate pushdown rules apply to the stitched query.

- Fresh MV: returns data table scan directly
- Partially stale: builds UNION of filtered fresh data + recomputed stale data

The planner constructs stitching plans by adding standard `UnionNode`, `TableScanNode`, and `FilterNode` nodes, then relies on existing planner rules for optimization. This design uses only the existing `MaterializedViewStatus` SPI method. Connectors only need to detect staleness and provide constraint information - they don't need to understand or implement stitching logic themselves. Filter nodes are pushed down by later planner rules through existing constraint pushdown mechanisms.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is non-intuitive to me... Why are filter nodes pushed later ? Materialized views are only table-scans (by nature its like a physical optimization that chooses an index of a table if available)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point I'm making is that the job of the MaterializedViewOptimizer is to be as simple as possible, as it's already a little complicated: just create the union with appropriate predicates. It will not also be its job to ensure those predicates are optimized correctly, that will be handled by our general optimizer rules.

@tdcmeehan tdcmeehan force-pushed the mv branch 2 times, most recently from dbc156b to 9cb3aea Compare October 3, 2025 02:05
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan thanks for the adjustments. I've just left some questions for discussion, mainly about the Iceberg implementation.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, LGTM!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asking for some clarifications in the RFC refresh:

  1. in Analysis, what does "store MV definition"/"store WHERE clause" mean? where are we going to store them?
  2. Why the first two steps of planning can't be done in analysis, where constructing the insert is easier?

Copy link
Contributor Author

@tdcmeehan tdcmeehan Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We will store them in the Analysis object, as we do currently. What I mean by "store WHERE clause" is generate the predicates for WHERE but don't apply them to the query, because depending on the connector you might not be able to do that (we might want to throw if that is supplied). We would store the WHERE clause as a standalone Expression, which we would then turn into a filter in the planner, which is an existing pattern.
  2. Because we need to apply the predicates depending on the shape of the query (whether or not there are aggregations present, among other criteria), it seems more straightforward to apply this during planning. I'm also worried about metadata overhead for Iceberg, where we'll have to scan potentially many files, and the unbounded nature of the work we do in the analysis phase. In all cases, analysis just accesses the catalog which is usually just an RPC call, but more expensive metadata calls happen later. By moving it to planning we have a natural cap on concurrency, since that is post-queueing (effectively lazy rather than eager).

Regarding the complexity of it, I think the most complicated thing to do is to determine when to apply the predicates based on the shape of the query, which I don't see being much easier in analysis. Do you picture applying the predicates to be very complicated? I picture it being about as complicated as applying row filters and column masks, which we do in the planner now.


* Backward compatibility with existing Hive MV implementation (it's undocumented and unused)
* Supporting materialized views across different connectors (cross-connector MVs)
* Automatic view selection/routing for arbitrary queries
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea of Calcite and the paper they built upon is to have the candidates built with rule based matching that can be done either in analysis or planning, and then have a cost based analysis in planning to finally choose the one or have further optimizations. In Presto, the rule based matching to generate candidates can be easier done in analysis though.


* Backward compatibility with existing Hive MV implementation (it's undocumented and unused)
* Supporting materialized views across different connectors (cross-connector MVs)
* Automatic view selection/routing for arbitrary queries
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In calcite they do have mechanisms to compare cost of plans to select the best mview

That's right. Comparing can be done with cost-based estimation in planning. But the candidates to compare is easier to generate in analysis.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we include involving connectors into the graph as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be getMaterializedViewStatus()

@aaneja aaneja self-requested a review October 7, 2025 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PRs from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants