feat: make LiteralPredicate serializable via internal IcebergBaseModel #2561

jaimeferj · 2025-10-02T23:35:57Z

Rationale for this change

Are these changes tested?

yes

Are there any user-facing changes?

jaimeferj · 2025-10-02T23:52:26Z

In the issue #2523 it is said to derive the class from IcebergBaseModel which I have not done, but could try on if my solution is not accepted.

pyiceberg/expressions/__init__.py

jaimeferj · 2025-10-03T17:00:56Z

I have now marked it as a Draft since I am not sure now that is the kind of implementation you want. Tests are still passing now using LiteralPredicate as subclass of IcebergBaseModel, but had to make term a Term[L] instead of what I thought it was (UnboundTerm[L]) because test_not_equal_to_invert and other ones of the same kind would fail otherwise, since they are using a BoundedTerm instead.

pyiceberg/expressions/__init__.py

jaimeferj · 2025-10-05T22:35:22Z

Something fishy that I had to pull in order for tests to pass was putting the attribute term as Term[L] instead of UnboundTerm[Any] as it was in UnboundPredicate, father of LiteralPredicate. However, that also is triggering mypy since we are changing types from father to child.

The problem is that the earlier implementation was calling _to_unbound_term when initializing the instance, however, it does not always return UnboundTerm as you can easily see from the implementation:

def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]:
    return Reference(term) if isinstance(term, str) else term

If term is not UnboundTerm nor str the output is whatever the input was. For example, as done in test test_not_equal_to_invert because it is being initialized with a BoundRefence! If you use in the Pydantic model term as UnboundTerm[L]:

_______________________________________________ test_not_equal_to_invert _______________________________________________

    def test_not_equal_to_invert() -> None:
>       bound = NotEqualTo(
            term=BoundReference(  # type: ignore
                field=NestedField(field_id=1, name="foo", field_type=StringType(), required=False),
                accessor=Accessor(position=0, inner=None),
            ),
            literal="hello",
        )

Should we address this in this PR or delegate it to another issue?

Fokko · 2025-10-08T07:10:13Z

Should we address this in this PR or delegate it to another issue?

That's a bit of an edge case, since you deliberately ignore the type annotation. We could add a check in the function itself:

def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]:
    if isinstance(term, str),
        return Reference(term)
    elif isinstance(term, UnboundTerm):
        return term
    else:
        raise ValueError(f"Expected UnboundTerm or str, but got: {term}")
    return Reference(term) if isinstance(term, str) else term

jaimeferj · 2025-10-08T18:04:04Z

def _to_unbound_term(term: Union[str, UnboundTerm[Any]]) -> UnboundTerm[Any]:
    if isinstance(term, str),
        return Reference(term)
    elif isinstance(term, UnboundTerm):
        return term
    else:
        raise ValueError(f"Expected UnboundTerm or str, but got: {term}")
    return Reference(term) if isinstance(term, str) else term

I do not ignore the type annotation, it is the current implementation and a current test that is ignoring the annotation. I am trying to just implement the issue requirement and since now the types are checked in runtime by pydantic the (old) test is not passing.

jaimeferj · 2025-10-10T22:42:04Z

pyiceberg/expressions/__init__.py

+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        if args:
+            if len(args) != 2:
+                raise TypeError("Expected (term, literal)")
+            kwargs = {"term": args[0], "literal": args[1], **kwargs}
+        super().__init__(**kwargs)


After having many issues with an init such as:

def __init__(self, term: Union[str, UnboundTerm[Any]], literals: Union[Iterable[L], Iterable[Literal[L]]]): super().__init__(term=_to_unbound_term(term), items=_to_literal_set(literals))

Because there are some typing errors with _transform_literal in pyiceberg/transforms.py for example:

pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[str | None], str | None]"; expected "Callable[[str], str]" [arg-type] pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bool | None], bool | None]"; expected "Callable[[str], str]" [arg-type] pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[int | None], int | None]"; expected "Callable[[str], str]" [arg-type] pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[float | None], float | None]"; expected "Callable[[str], str]" [arg-type] pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[bytes | None], bytes | None]"; expected "Callable[[str], str]" [arg-type] pyiceberg/transforms.py:1113: error: Argument 1 to "_transform_literal" has incompatible type "Callable[[UUID | None], UUID | None]"; expected "Callable[[str], str]" [arg-type]

I decided to just go for this implementation of init. The problem now is that:

assert_type(EqualTo("a", "b"), EqualTo[str]) # <-- Fails ------ tests/expressions/test_expressions.py:1238: error: Expression is of type "LiteralPredicate[L]", not "EqualTo[str]" [assert-type]

So I am really stuck, would you mind lending a hand here? @Fokko

Always! So, I think the linter isn't really sure what to do. It is pretty clear:

In the signature, we see that transform accepts an optional, which I think is correct. However, _transform_literal requires non-null, which is incorrect.

def _truncate_number( name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]] ) -> Optional[UnboundPredicate[Any]]: boundary = pred.literal if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)): raise ValueError(f"Expected a numeric literal, got: {type(boundary)}") if isinstance(pred, BoundLessThan): return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement())) elif isinstance(pred, BoundLessThanOrEqual):

The following change suppresses most of the warnings for me:

-def _transform_literal(func: Callable[[L], L], lit: Literal[L]) -> Literal[L]: +def _transform_literal(func: Callable[[Any], Any], lit: Literal[L]) -> Literal[L]: """Small helper to upwrap the value from the literal, and wrap it again.""" return literal(func(lit.value))

Fokko · 2025-10-12T20:13:02Z

pyiceberg/expressions/__init__.py

+class LiteralPredicate(IcebergBaseModel, UnboundPredicate[L], ABC):
+    type: TypingLiteral["lt", "lt-eq", "gt", "gt-eq", "eq", "not-eq", "starts-with", "not-starts-with"] = Field(alias="type")
+    term: UnboundTerm[L]
+    literal: Literal[L] = Field(serialization_alias="value")


Suggested change

literal: Literal[L] = Field(serialization_alias="value")

value: Literal[L] = Field()

Fokko · 2025-10-12T20:13:58Z

pyiceberg/expressions/__init__.py

+    @field_validator("term", mode="before")
+    @classmethod
+    def _coerce_term(cls, v: Any) -> UnboundTerm[Any]:
+        return _to_unbound_term(v)

-    def __init__(self, term: Union[str, UnboundTerm[Any]], literal: Union[L, Literal[L]]):  # pylint: disable=W0621
-        super().__init__(term)
-        self.literal = _to_literal(literal)  # pylint: disable=W0621
+    @field_validator("literal", mode="before")
+    @classmethod
+    def _coerce_literal(cls, v: Union[L, Literal[L]]) -> Literal[L]:
+        return _to_literal(v)
+
+    @field_serializer("literal")
+    def ser_literal(self, literal: Literal[L]) -> str:
+        return "Any"


If we just call the field value, and we add a literal property:

@property def literal(self) -> Literal[L]: return self.value

Fokko · 2025-10-12T20:50:29Z

pyiceberg/expressions/__init__.py

+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        if args:
+            if len(args) != 2:
+                raise TypeError("Expected (term, literal)")
+            kwargs = {"term": args[0], "literal": args[1], **kwargs}
+        super().__init__(**kwargs)


Always! So, I think the linter isn't really sure what to do. It is pretty clear:

In the signature, we see that transform accepts an optional, which I think is correct. However, _transform_literal requires non-null, which is incorrect.

def _truncate_number( name: str, pred: BoundLiteralPredicate[L], transform: Callable[[Optional[L]], Optional[L]] ) -> Optional[UnboundPredicate[Any]]: boundary = pred.literal if not isinstance(boundary, (LongLiteral, DecimalLiteral, DateLiteral, TimestampLiteral)): raise ValueError(f"Expected a numeric literal, got: {type(boundary)}") if isinstance(pred, BoundLessThan): return LessThanOrEqual(Reference(name), _transform_literal(transform, boundary.decrement())) elif isinstance(pred, BoundLessThanOrEqual):

The following change suppresses most of the warnings for me:

-def _transform_literal(func: Callable[[L], L], lit: Literal[L]) -> Literal[L]: +def _transform_literal(func: Callable[[Any], Any], lit: Literal[L]) -> Literal[L]: """Small helper to upwrap the value from the literal, and wrap it again.""" return literal(func(lit.value))

Fokko reviewed Oct 3, 2025

View reviewed changes

pyiceberg/expressions/__init__.py Outdated Show resolved Hide resolved

jaimeferj marked this pull request as draft October 3, 2025 16:56

jaimeferj marked this pull request as ready for review October 3, 2025 20:19

Fokko reviewed Oct 5, 2025

View reviewed changes

pyiceberg/expressions/__init__.py Outdated Show resolved Hide resolved

Fokko reviewed Oct 5, 2025

View reviewed changes

pyiceberg/expressions/__init__.py Outdated Show resolved Hide resolved

jaimeferj added 5 commits October 10, 2025 23:17

feat: make LiteralPredicate serializable via internal IcebergBaseModel

9257a6d

feat: subclass LiteralPredicate instead of using internal class

c1f7384

fix: use type in main class only and remove __op__

482f4e0

fix adding lt literal and allow boundreference in _to_unbound_term

dd7e09b

fix: remove type hinting errors

5118748

jaimeferj force-pushed the feat/json-literal-predicate branch from 00bc5db to 5118748 Compare October 10, 2025 22:35

jaimeferj commented Oct 10, 2025

View reviewed changes

Fokko reviewed Oct 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: make LiteralPredicate serializable via internal IcebergBaseModel #2561

feat: make LiteralPredicate serializable via internal IcebergBaseModel #2561

jaimeferj commented Oct 2, 2025

Uh oh!

jaimeferj commented Oct 2, 2025

Uh oh!

Uh oh!

jaimeferj commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

jaimeferj commented Oct 5, 2025 •

edited

Loading

Uh oh!

Fokko commented Oct 8, 2025

Uh oh!

jaimeferj commented Oct 8, 2025 •

edited

Loading

Uh oh!

jaimeferj Oct 10, 2025

Uh oh!

Fokko Oct 12, 2025

Uh oh!

Fokko Oct 12, 2025

Uh oh!

Fokko Oct 12, 2025

Uh oh!

Fokko Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	literal: Literal[L] = Field(serialization_alias="value")
	value: Literal[L] = Field()

feat: make LiteralPredicate serializable via internal IcebergBaseModel #2561

Are you sure you want to change the base?

feat: make LiteralPredicate serializable via internal IcebergBaseModel #2561

Conversation

jaimeferj commented Oct 2, 2025

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

jaimeferj commented Oct 2, 2025

Uh oh!

Uh oh!

jaimeferj commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

jaimeferj commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fokko commented Oct 8, 2025

Uh oh!

jaimeferj commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jaimeferj Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Fokko Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Fokko Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Fokko Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Fokko Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaimeferj commented Oct 5, 2025 •

edited

Loading

jaimeferj commented Oct 8, 2025 •

edited

Loading