-
Notifications
You must be signed in to change notification settings - Fork 31
Add materialized views RFC #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b3e4930
to
26b27c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan Thanks for this RFC. Meta is also planning to resume the work in the materialized views. Back in the days, I had initially tried to build this initially in the planning phase; however, that led to really complex and messy structures, which were very difficult to work with. We probably need even higher level abstraction. Let's align with @arhimondr, @kaikalur and @zation99 on this.
Two issues
so we need to do some cost benefit analysis. Best might be to keep it narrowly scoped like a "preagg a single table" view for aggregation only queries to start with. Once that stabilizes, we can extend it to other shapes. |
Thanks for the comments @jainxrohit and @kaikalur. @kaikalur to be clear, this design allows for your recommendation of progressive refinement. This design's default behavior is to be rather simple, exactly as you recommended--single table materialized views with simple stale/fresh checks. Progressive enhancements, such as fine grained refresh or unioning up to date partitions with more up to date data, are optional and can be implemented by connectors as needed. @jainxrohit thanks for the background. I believe this design abstracts complexity and allows for progressive enhancement. I'm keen to maintain momentum on the implementation. Please let me know if you need any additional information or clarification to speed up the review. |
I think the main detail I need to understand better is the proposal to move materialized view optimization from the analysis phase to the planning phase. I believe doing this in the planning phase, especially for the use cases Meta wants to support, is really complex. Should we set up some time to discuss the details further? |
@jainxrohit I'll be happy to brainstorm how this might work for your internal infrastructure. |
|
||
* Backward compatibility with existing Hive MV implementation (it's undocumented and unused) | ||
* Supporting materialized views across different connectors (cross-connector MVs) | ||
* Automatic view selection/routing for arbitrary queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While query rewriting is mentioned as a non-goal, using planning-phase plans for MV matching / query rewriting would be far better than the AST based rewriter we have now.
This is how Calcite (and therefore Hive) does MV rewrites - https://calcite.apache.org/docs/materialized_views.html as well e.g
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify what makes plan-phase MV matching/rewriting far better? Although Calcite implements it in planning, they are still rule based matching and seems nothing specific from planning phase is utilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In calcite they do have mechanisms to compare cost of plans to select the best mview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any cost based decision would need to be done in the planner, and there are numerous options: cost based refresh strategy (incremental vs. full), cost based stitching determination, and cost based view replacement. We don't need to implement any of this now, but keeping this rewriting in the analysis phase makes starting those projects harder, because they become not only about implementing the cost based decisions, but also rewriting the the stitching and view replacement code. We want to have an excellent and robust materialized view support in Presto, which means intelligent decisions on when to materialize and what views to select, which means moving to the planner is a worthwhile endeavor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea of Calcite and the paper they built upon is to have the candidates built with rule based matching that can be done either in analysis or planning, and then have a cost based analysis in planning to finally choose the one or have further optimizations. In Presto, the rule based matching to generate candidates can be easier done in analysis though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In calcite they do have mechanisms to compare cost of plans to select the best mview
That's right. Comparing can be done with cost-based estimation in planning. But the candidates to compare is easier to generate in analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Presto, the rule based matching to generate candidates can be easier done in analysis though.
In Presto, all the rule based plan matching happens in the planner (i.e in the PlanOptimizer rules), not in the analysis phase.
Can you clarify, with an example, what do you mean by candidate generation and the rules to generate these candidates ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan : Did a read through this RFC and its a heavy read. I have some early comments, but will likely have more as I absorb this material.
Are you planning to do a talk through of this RFC in one of the working groups ? Might be good to get consensus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
### 4. Query Rewriting | ||
- Simple pattern matching to detect if a query can use an MV | ||
- Limited to exact matches or simple projections | ||
- No cost-based selection between multiple MVs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also a limitation of the current design since the analyzer is still doing the replacement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can in future do something like the MviewReference(CTEReference --> LogicalCte) or drop it approach. i.e the analyzer creates candidates and the planner chooses them somehow. Could also help with fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that this is a limitation also of this design, but the current solution makes it impossible to add cost-based selection the future, whereas this design allows for this future enhancement, perhaps along the lines of what you mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the problem remains when we would do "view replacement", we would need to create some canonicalized AST to replace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the new design will handle this better, or at least I can see a high level path on how we can get that to work. Connectors return a list of possible replacement candidates via the existing ConnectorMetadata#getReferencedMaterializedViews
for a given table during analysis. We put this in the MaterializedViewNode
AST, and plan each of all of the candidates. At this point I believe we have all of the tools necessary to leverage some of the techniques that have been written about, for example the technique Calcite uses.
bcdc3b6
to
b5f163e
Compare
@jainxrohit @zation99 @arhimondr @aditi-pandit @jaystarshot @hantangwangd @aaneja Thanks for all your previous reviews and discussions. One of the core assumptions in my design before was that the existing materialized view framework was unused, and so I had creative liberty to redesign it without a significant impact to existing users. With the revelation that Meta will be entering production soon with the existing framework, this means that no longer holds true, and I've updated the design to emphasize continuity and provide a reasonable migration path. Sharing stitching and refresh logic between the open source Iceberg connector and Meta's vast internal infrastructure will be a huge advantage to the open source project. The RFC now primarily proposes: 1) Refactoring the stitching to be reliant on predicates provided by the connector outside of the table handle (table handle provided staleness still works fine, but open source connectors need to do expensive metadata calls to determine staleness, and this works better as a separate call outside of analysis). 2) A new type of predicate is added which can support time travel based stitching in exclusion to partition based stitching (optional, only if the connector supports that feature). 3) Support for automatic REFRESH (using the same predicates used for stitching). Please take a look and let me know if you have any feedback. |
19f370a
to
15d316b
Compare
The existing materialized views implementation is located primarily in the presto-hive module and follows this flow: | ||
|
||
### 1. Metadata Storage | ||
- MVs are stored as Hive tables with special properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : This metadata part is unchanged in the RFC. Maybe just say that explicitly.
#### Query-Time Analysis (SELECT from MV) | ||
When querying a materialized view, the system: | ||
- Recognizes the table as an MV during name resolution | ||
- Calculates staleness by checking partition modification times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this achieved ? The analysis doesn't really know about partition key, unless it is added as part of the metadata. Also, how do we obtain partition modification time ? Can we call out to the connector from analysis itself ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's piped through the predicateDisjuncts
which are just TupleDomain
s on the MaterializedViewStatus
object, which is called during analysis. This check is probably reasonably lightweight for Hive, but for Iceberg this will be more sophisticated and more complicated, and also more resource intensive since it involves reading the Iceberg metadata from the filesystem. In my proposal this call to retrieve the MaterializedViewStatus
happens later, after queuing, but they involve similar predicates.
RFC-0013-materialized-views.md
Outdated
 | ||
|
||
New AST Node: MaterializedViewScan | ||
- Left side: Data table scan (storage table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WHy is this left side and right side ? Makes it sound like a join.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this as left and right because it's a tree in the AST. The left is just a leaf node representing the base table scan, the right is a subtree representing the definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the language more explicit.
RFC-0013-materialized-views.md
Outdated
New AST Node: MaterializedViewScan | ||
- Left side: Data table scan (storage table) | ||
- Right side: View query structure | ||
- MV metadata for planning decisions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know some fields for this already ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be the the MaterializedViewDefinition
, updated.
- MV metadata and symbol mappings for optimization | ||
- Standard `PlanNode` that integrates with existing infrastructure | ||
|
||
Optimization Phase: MaterializedViewOptimizer Rule (Default Implementation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this rule positioned in the planner rules application ? Closer to the end of the rules list ? If there are predicates etc we want them to be pushed down as much as possible before choosing a MV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picture this being early during optimization, so that all of our usual predicate pushdown rules apply to the stitched query.
- Fresh MV: returns data table scan directly | ||
- Partially stale: builds UNION of filtered fresh data + recomputed stale data | ||
|
||
The planner constructs stitching plans by adding standard `UnionNode`, `TableScanNode`, and `FilterNode` nodes, then relies on existing planner rules for optimization. This design uses only the existing `MaterializedViewStatus` SPI method. Connectors only need to detect staleness and provide constraint information - they don't need to understand or implement stitching logic themselves. Filter nodes are pushed down by later planner rules through existing constraint pushdown mechanisms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is non-intuitive to me... Why are filter nodes pushed later ? Materialized views are only table-scans (by nature its like a physical optimization that chooses an index of a table if available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point I'm making is that the job of the MaterializedViewOptimizer
is to be as simple as possible, as it's already a little complicated: just create the union with appropriate predicates. It will not also be its job to ensure those predicates are optimized correctly, that will be handled by our general optimizer rules.
dbc156b
to
9cb3aea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan thanks for the adjustments. I've just left some questions for discussion, mainly about the Iceberg implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asking for some clarifications in the RFC refresh:
- in Analysis, what does "store MV definition"/"store WHERE clause" mean? where are we going to store them?
- Why the first two steps of planning can't be done in analysis, where constructing the insert is easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We will store them in the Analysis object, as we do currently. What I mean by "store WHERE clause" is generate the predicates for WHERE but don't apply them to the query, because depending on the connector you might not be able to do that (we might want to throw if that is supplied). We would store the WHERE clause as a standalone Expression, which we would then turn into a filter in the planner, which is an existing pattern.
- Because we need to apply the predicates depending on the shape of the query (whether or not there are aggregations present, among other criteria), it seems more straightforward to apply this during planning. I'm also worried about metadata overhead for Iceberg, where we'll have to scan potentially many files, and the unbounded nature of the work we do in the analysis phase. In all cases, analysis just accesses the catalog which is usually just an RPC call, but more expensive metadata calls happen later. By moving it to planning we have a natural cap on concurrency, since that is post-queueing (effectively lazy rather than eager).
Regarding the complexity of it, I think the most complicated thing to do is to determine when to apply the predicates based on the shape of the query, which I don't see being much easier in analysis. Do you picture applying the predicates to be very complicated? I picture it being about as complicated as applying row filters and column masks, which we do in the planner now.
|
||
* Backward compatibility with existing Hive MV implementation (it's undocumented and unused) | ||
* Supporting materialized views across different connectors (cross-connector MVs) | ||
* Automatic view selection/routing for arbitrary queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general idea of Calcite and the paper they built upon is to have the candidates built with rule based matching that can be done either in analysis or planning, and then have a cost based analysis in planning to finally choose the one or have further optimizations. In Presto, the rule based matching to generate candidates can be easier done in analysis though.
|
||
* Backward compatibility with existing Hive MV implementation (it's undocumented and unused) | ||
* Supporting materialized views across different connectors (cross-connector MVs) | ||
* Automatic view selection/routing for arbitrary queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In calcite they do have mechanisms to compare cost of plans to select the best mview
That's right. Comparing can be done with cost-based estimation in planning. But the candidates to compare is easier to generate in analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we include involving connectors into the graph as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be getMaterializedViewStatus()
No description provided.