-
Notifications
You must be signed in to change notification settings - Fork 170
feat: Support over expressions more freely, make expressions printable, rewrite internals (travelling pr 🌴 )
#3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4feae2f to
b4141af
Compare
|
Unless there's objections, I'll do a careful read-though of everything and then will plan to ship this by the end of the week |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've been on holiday I had a bit of time while travelling recently, so I tried rewriting the internals (yes, again)
The general idea is that
narwhals.Exprstores a list of expressions in._nodes, rather than passing around a bunch of opaque lambdasWhat this enables
Expressions are pretty-printable
For example:
This is still quite basic, but we could make it more complex by introducing line breaks if it gets too long. Seems like the kind of thing @camriddell might be interested in?
We can do simple expression rewrites
Currently,
(nw.col('a').mean() + 1).over('b')isn't supported:over(mean(a) + 1) over (partition by b)isn't valid syntax, it should bemean(a) over (partition by b) + 1With this PR, however, it is!
What we do here is, when inserting an
overnode, we push it down before any elementwise operations (such as+,.abs(),sum_horizontal, ...) and apply it to all expressions. There's some more details in the expansion to "how it works"So now, expressions like
(nw.col('a').mean() + 1).over('b')can be supported for all backendsThis rewrite is extremely simple and cheap, it's just a matter of inserting a node at some position
irather than at the end of a list. In general, query optimisation is out of scope for Narwhals. But, given that this enables more of Polars' flexibility for other backends, I think this can be in scope.Per-group broadcasting
Previously,
nw.col('a') - nw.col('a').mean()would be fine, but(nw.col('a') - nw.col('a').mean()).over('b')would raise for sql-like backends. Now, it works fine across all backends! Really useful for feature engineeringSimplified internals
depth,function_name,scalar_kwargsCompliantWhen/CompliantThenand their complicated interaction with justCompliantNamespace.when_thenIn fact, this goes as far as reducing package size by almost 1%. Not that that was the objective with this work, but it's nice to see that it doesn't make the package bigger
What this may open the doors to
nw.Expr.from_json(expr.to_json())nw.col('a').shift(7).rolling_mean(7).over('store', order_by='date')df.group_by('a').agg((nw.col('b')-nw.col('c')).mean())Benchmarks
I'm not seeing any meaningful impact on performance