-
Notifications
You must be signed in to change notification settings - Fork 170
chore: Align nw.concat(..., how="vertical") behavior across backend for schema mismatch
#3191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return str(Path(source)) | ||
|
|
||
|
|
||
| def validate_concat_vertical_schemas(schemas: Iterable[Mapping[str, DType]]) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By passing an iterable, we can avoid to immediately evaluate all the schemas
| msg = ( | ||
| "'union'/'concat' inputs should all have the same schema,got\n" | ||
| f"{_pretty_format_schema(schema_0, index=0)}\n" | ||
| " and\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The space here is not a typo, that's what polars has π
for more information, see https://pre-commit.ci
|
QQ: Should we consider pushing the validation at the narwhals level to have it fully standardized? π€ |
Could we implement the https://docs.pola.rs/api/python/stable/reference/api/polars.concat.html I'm pretty sure that at least IMO, it seems like this change could break things downstream - so having an alternative to suggest may make the transition smoother π€ |
|
Currently (at latest dependencies versions) polars, pyarrow and ibis would raise natively with schema mismatch. Notice that for duckdb and ibis we also already have a schema check, which we don't have for pyspark and dask. Table view:
Oh boy, that's definitely a way bigger task than anticipated, and I am not really interested in it to be fair π
I understand this is a risk, but to me it seems uncrackable/unreliable from a user perspective. Arguably we could maintain the current behavior for stable.v1, maybe for stable.v2, but for sure I would want the changes in the main namespace (which has a few less promises on stability).
Sure, one alternative is to suggest to cast the dtypes before calling |
|
It would be quite natural to do this with I found myself doing something pretty similar the other day, but working directly on narwhals/narwhals/_plan/arrow/functions.py Lines 140 to 158 in 2403f1b
|
|
<rant>How many different error messages did polars change?<\rant> |
nw.concat(..., how="vertical") behavior across backend for schema mismatch
|
thanks @FBruzzesi ! i'm not totally sure, my original hesitation for being as strict as polars was that in other backends it's easier to accidentally change the datatype (e.g. if a missing value gets inserted), although it is encouraging that the downstream tests pass π€ |
|
Hey @MarcoGorelli ππΌ
Doesn't this apply only to pandas non-nullable dtypes?
Not sure how used and tested this feature is to be honest. I will move it back to draft state and make a proposal that is also consistent for |

What type of PR is this? (check all applicable)
Checklist
If you have comments or can explain your changes, please do so below
We were a bit far from polars behavior for vertical concat. This PR aims to better align with it.
Polars (and some other backends) do not allow to concatenate with different dtypes. The casting has to happen explicitly rather than implicitly. Hence we now check for schema equality rather than column names only.
The error raised tries to match the (lazy) polars' one.
The schemas of the various frames are evaluated lazily for performance