Skip to content

Conversation

Aniketsy
Copy link
Contributor

@Aniketsy Aniketsy commented Oct 2, 2025

#2524

This PR addresses issue by making the SetPredicate class and its subclasses (In, NotIn) JSON serializable using Pydantic.

  • Added tests to verify JSON serialization of In and NotIn predicates.

Please let me know if my approach or fix needs any improvements . I’m open to feedback and happy to make changes based on suggestions.
Thank you !

Aniketsy and others added 2 commits October 2, 2025 12:35
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 2, 2025

@Fokko Thanks, for the feedback and correction . Please let me know if I need to do further improvements.

@Fokko
Copy link
Contributor

Fokko commented Oct 2, 2025

@Aniketsy I've left a few more, but I think that should be it. We're super close!

Aniketsy and others added 3 commits October 3, 2025 00:54
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 2, 2025

@Fokko Thanks! I’ve committed the suggested changes.

Aniketsy and others added 4 commits October 3, 2025 01:26
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aniketsy Thanks for the quick turnaround, the changes I've just suggested fix all the mypy errors for me locally. You still need to check out the branch locally and run make lint to fix all the formatting issues.

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 2, 2025

@Fokko Thanks! Will run make lint locally and address the formatting issues.

Aniketsy and others added 4 commits October 3, 2025 01:44
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
Co-authored-by: Fokko Driesprong <fokko@apache.org>
@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 2, 2025

@Fokko I noticed the integration-test 3.12 CI is failing. I’m not sure what’s causing it, could you provide some guidance?

Aniketsy and others added 2 commits October 3, 2025 09:04
Co-authored-by: Fokko Driesprong <fokko@apache.org>
def __invert__(self) -> In[L]:
"""Transform the Expression into its negated version."""
return In[L](self.term, self.literals)
return NotIn[L](self.term, self.literals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted:

Suggested change
return NotIn[L](self.term, self.literals)
return In[L](self.term, self.literals)


class SetPredicate(UnboundPredicate[L], ABC):
literals: Set[Literal[L]]
class SetPredicate(UnboundPredicate[L], IcebergBaseModel, ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class and init should be:

class SetPredicate(IcebergBaseModel, UnboundPredicate[L], ABC):
    model_config = ConfigDict(arbitrary_types_allowed=True)

    type: TypingLiteral["in", "not-in"] = Field(default="in")
    literals: Set[Literal[L]] = Field(alias="items")

    def __init__(self, term: Union[str, UnboundTerm[Any]], literals: Union[Iterable[L], Iterable[Literal[L]]]):
        super().__init__(term=_to_unbound_term(term), items=_to_literal_set(literals))

@Fokko
Copy link
Contributor

Fokko commented Oct 3, 2025

@Aniketsy The suggestions above work on my end. Sorry for this issue, it is more complicated than I anticipated

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 3, 2025

Thanks @Fokko , I’ve committed the changes based on your suggestions. Do I need to check ruff format from my end .

@Fokko
Copy link
Contributor

Fokko commented Oct 3, 2025

I've just kicked off the CI. I think we also need to merge #2564 first

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 3, 2025

Got it, thanks for triggering the CI. I’ll wait for #2564 to be merged.

@kevinjqliu
Copy link
Contributor

#2564 is merged, do you mind rebasing this PR?

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 5, 2025

sure, i will do that.

@Aniketsy Aniketsy force-pushed the fix/setpredicate-json-serialization branch from 0cd1fb3 to f922d17 Compare October 5, 2025 20:13
@Fokko
Copy link
Contributor

Fokko commented Oct 5, 2025

@Aniketsy I checked out your branch locally, and it looks like there is a bit more to do to get the test passing: #2572 😨

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 5, 2025

@Fokko Could you please guide me on what I should do to get the test passing?

@kevinjqliu
Copy link
Contributor

ruff (legacy alias)......................................................Failed
- hook id: ruff
- exit code: 1
- files were modified by this hook

F821 Undefined name `IcebergBaseModel`
   --> pyiceberg/expressions/__init__.py:580:41
    |
580 | class SetPredicate(UnboundPredicate[L], IcebergBaseModel, ABC):
    |                                         ^^^^^^^^^^^^^^^^
581 |     model_config = ConfigDict(arbitrary_types_allowed=True)
    |

Found 3 errors (2 fixed, 1 remaining).

perhaps this is an import issue. can you try make lint / make test locally?

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 6, 2025

Hi @kevinjqliu I’ve fixed the error. Could you please take a look?

@kevinjqliu
Copy link
Contributor

#2575 is merged, lets see if this can block the PR

@Fokko
Copy link
Contributor

Fokko commented Oct 8, 2025

@Aniketsy I checked out the branch locally, and with the following changes the tests should pass:

➜  iceberg-python git:(fix/setpredicate-json-serialization) ✗ git diff

diff --git a/pyiceberg/expressions/__init__.py b/pyiceberg/expressions/__init__.py
index 00cb404bb..d0824cc31 100644
--- a/pyiceberg/expressions/__init__.py
+++ b/pyiceberg/expressions/__init__.py
@@ -577,15 +577,14 @@ class NotNaN(UnaryPredicate):
         return BoundNotNaN[L]
 
 
-class SetPredicate(UnboundPredicate[L], IcebergBaseModel, ABC):
+class SetPredicate(IcebergBaseModel, UnboundPredicate[L], ABC):
     model_config = ConfigDict(arbitrary_types_allowed=True)
 
-    type: TypingLiteral["in", "not-in"] = Field(default="in", alias="type")
+    type: TypingLiteral["in", "not-in"] = Field(default="in")
     literals: Set[Literal[L]] = Field(alias="items")
 
     def __init__(self, term: Union[str, UnboundTerm[Any]], literals: Union[Iterable[L], Iterable[Literal[L]]]):
-        super().__init__(term)
-        self.literals = _to_literal_set(literals)
+        super().__init__(term=_to_unbound_term(term), items=_to_literal_set(literals))  # type: ignore
 
     def bind(self, schema: Schema, case_sensitive: bool = True) -> BoundSetPredicate[L]:
         bound_term = self.term.bind(schema, case_sensitive)
@@ -737,7 +736,7 @@ class NotIn(SetPredicate[L], ABC):
 
     def __invert__(self) -> In[L]:
         """Transform the Expression into its negated version."""
-        return NotIn[L](self.term, self.literals)
+        return In[L](self.term, self.literals)
 
     @property
     def as_bound(self) -> Type[BoundNotIn[L]]:
diff --git a/tests/expressions/test_expressions.py b/tests/expressions/test_expressions.py
index 5123a62ea..5a0c8c924 100644
--- a/tests/expressions/test_expressions.py
+++ b/tests/expressions/test_expressions.py
@@ -875,12 +875,12 @@ def test_not_in() -> None:
 
 def test_serialize_in() -> None:
     pred = In(term="foo", literals=[1, 2, 3])
-    assert pred.model_dump_json() == '{"type":"in","term":"foo","value":[1,2,3]}'
+    assert pred.model_dump_json() == '{"term":"foo","type":"in","items":[1,2,3]}'
 
 
 def test_serialize_not_in() -> None:
     pred = NotIn(term="foo", literals=[1, 2, 3])
-    assert pred.model_dump_json() == '{"type":"not-in","term":"foo","value":[1,2,3]}'
+    assert pred.model_dump_json() == '{"term":"foo","type":"not-in","items":[1,2,3]}'
 
 
 def test_bound_equal_to(term: BoundReference[Any]) -> None:

@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 8, 2025

@Fokko Thanks! I’ve applied the changes as per your suggestions.

@Fokko Fokko merged commit 517b930 into apache:main Oct 9, 2025
10 checks passed
@Aniketsy
Copy link
Contributor Author

Aniketsy commented Oct 9, 2025

Thank you for guiding, reviewing, and correcting me at each step throughout this PR . @Fokko @kevinjqliu

@Fokko
Copy link
Contributor

Fokko commented Oct 9, 2025

With pleasure, thank you for contributing 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants