-
Notifications
You must be signed in to change notification settings - Fork 31
INTPYTHON-833 Remove $facet in top level group stages #449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
INTPYTHON-833 Remove $facet in top level group stages #449
Conversation
|
Awesome, thank you and thanks @asya999 |
aclark4life
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test, particular one that would test something that breaks in the Django admin with QE (even though QE is not merged yet)?
django_mongodb_backend/compiler.py
Outdated
| } | ||
| ) | ||
| pipeline.append({"$group": group}) | ||
| # It may be a bug, $$NOW has to be called to be reachable in the rest of the pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add link to the SERVER ticket and add an explanatory comment about the purpose of the $unionWith clause.
I think we can also mention it in the release notes as a performance improvement since the team said $facet prevents the use of optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this cannot be implemented as it is.
To handle the global aggregation and the default values (for example:
SELECT coalesce(avg(age), 35) FROM people WHERE age > 100;we need to wrap the aggregation so MongoDB always returns a single document.
This is done by wrapping the pipeline inside a lookup. Example:
db.z.aggregate([
{ $collStats: {} },
{ $lookup: { from: "z", as: "wrap", pipeline: [...pipeline stages] } },
{ $replaceWith: {"$cond": [{"$eq": ["$wrap", []]}, {}, {$first: "$wrap"}]}}
])
It’s not very intuitive use of $collStats, but it works 😄. It replaces the need for $unionWith.
$collStats is used as a pivot so we can apply the required operator after the group result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well played (I think?) 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 The idea comes from the team
|
Can you provide, in this PR, an example of what the removal of $facet makes the new query signatures look like? |
3e2c32d to
273957f
Compare
|
I updated the test on the Django fork so it won't fail. |
273957f to
c082f0d
Compare
timgraham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a test with assertAggregateQuery, perhaps in tests/aggregation_.
| # If distinct=True or resolve_inner_expression=False, sum the size of the | ||
| # set. | ||
| lhs_mql = process_lhs(self, compiler, connection, as_expr=True) | ||
| lhs_mql = {"$ifNull": [lhs_mql, []]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about what fails without this?
9830b06 to
4dea72d
Compare
django_mongodb_backend/query.py
Outdated
| if self.wrap_for_global_aggregation: | ||
| # Use $collStats as a pivot to guarantee a single input document | ||
| pipeline = [ | ||
| {"$collStats": {}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Zoom, it sounds like $collStats causes extra filesystem access that could be avoided by using $document. If it's straightforward, use that for now and I'll add a condition to use $collStats only for encrypted connections in that PR.
django_mongodb_backend/query.py
Outdated
| if self.wrap_for_global_aggregation: | ||
| # Add an empty extra document to handle default values on empty results | ||
| pipeline.append({"$unionWith": {"pipeline": [{"$documents": [{}]}]}}) | ||
| if self.having: | ||
| pipeline.append({"$match": self.having}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this make the test pass, but I think this should be tested in deep, need to add a test that having filter all the groups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the current django tests cover the case and it is right.
f65c7f1 to
b8900af
Compare
Generated query:
[{'$collStats': {}}, {'$lookup': {'as': 'wrapped', 'from': 'aggregation_author', 'pipeline': [{'$group': {'_id': None, 'count': {'$sum': {'$cond': {'else': 1, 'if': {'$in': [{'$type': '$_id'}, ['missing', 'null']]}, 'then': None}}}}}]}}, {'$replaceWith': {'$cond': [{'$eq': ['$wrapped', []]}, {}, {'$first': '$wrapped'}]}}, {'$project': {'count': {'$ifNull': ['$count', {'$literal': 0}]}}}]