Skip to content

Expr.is_in(Iterable) raises inconsistently #3195

@dangotbanned

Description

@dangotbanned

Related

(TODO - move write-up here)

Description

We document that Expr.is_in supports Iterable, and check it at runtime.

narwhals/narwhals/expr.py

Lines 971 to 975 in ebb2a40

def is_in(self, other: Any) -> Self:
"""Check if elements of this expression are present in the other iterable.
Arguments:
other: iterable

But in #3189 I found that our tests only cover list.

Kinda suprised by which backends do/don't work.
Fixing it for all of them is any easy task - just something like this really:

def is_in(self, other: Sequence[Any]) -> Self:
other_ = tuple(other) if not isinstance(other, (tuple, list)) else other
return self._with_elementwise(lambda expr: F("contains", lit(other_), expr))

Repro

import narwhals as nw

data = {"a": [1, 4, 2, 5]}
df = nw.from_dict(data, backend="polars")
sequence = 4, 2
other = sequence
>>> df.select(nw.col("a").is_in(other))
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (4, 1)   |
|  ┌───────┐       |
|a|
|---|
|bool|
|  ╞═══════╡       |
|false|
|true|
|true|
|false|
|  └───────┘       |
└──────────────────┘

Eager

other = iter(sequence)
>>> df.select(nw.col("a").is_in(other))
TypeError: cannot create expression literal for value of type tuple_iterator.

Hint: Pass `allow_object=True` to accept any value and create a literal of type Object.
expr = nw.col("a").is_in(iter(sequence))
>>> nw.from_dict(data, backend="pandas").select(expr)
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|            a     |
|     0  False     |
|     1   True     |
|     2   True     |
|     3  False     |
└──────────────────┘
expr = nw.col("a").is_in(iter(sequence))
>>> nw.from_dict(data, backend="pyarrow").select(expr)
┌────────────────────────────┐
|     Narwhals DataFrame     |
|----------------------------|
|pyarrow.Table               |
|a: bool                     |
|----                        |
|a: [[false,true,true,false]]|
└────────────────────────────┘

Lazy

df = nw.from_dict(data, backend="polars")
expr = nw.col("a").is_in(iter(sequence))
>>>df.lazy("ibis").select(expr).collect("polars")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (4, 1)   |
|  ┌───────┐       |
|a|
|---|
|bool|
|  ╞═══════╡       |
|false|
|true|
|true|
|false|
|  └───────┘       |
└──────────────────┘
df = nw.from_dict(data, backend="polars")
expr = nw.col("a").is_in(iter(sequence))
>>>df.lazy("duckdb").select(expr).collect("polars")
NotImplementedException: Not implemented Error: Unable to transform python value of type '<class 'tuple_iterator'>' to DuckDB LogicalType
from sqlframe.duckdb import DuckDBSession

df = nw.from_dict(data, backend="polars")
expr = nw.col("a").is_in(iter(sequence))
>>> df.lazy("sqlframe", session=DuckDBSession()).select(expr).collect("polars")
ValueError: Cannot convert <tuple_iterator object at 0x000001DEC8260970>
df = nw.from_dict(data, backend="polars")
expr = nw.col("a").is_in(iter(sequence))
>>> df.lazy("dask").select(expr).collect("polars")
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (4, 1)   |
|  ┌───────┐       |
|a|
|---|
|bool|
|  ╞═══════╡       |
|false|
|true|
|true|
|false|
|  └───────┘       |
└──────────────────┘

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions