[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer #53239

Yicong-Huang · 2025-11-26T19:42:38Z

What changes were proposed in this pull request?

This PR separates SQL_GROUPED_AGG_PANDAS_UDF and SQL_WINDOW_AGG_PANDAS_UDF into a dedicated serializer ArrowStreamAggPandasUDFSerializer, aligning with the existing ArrowStreamAggArrowUDFSerializer architecture.

Why are the changes needed?

Input/Output type differences: Aggregation UDFs (SQL_GROUPED_AGG_PANDAS_UDF and SQL_WINDOW_AGG_PANDAS_UDF) have different input/output types compared to grouped map UDFs:
- Aggregation UDFs: Input is pd.Series (entire group/partition), output is scalar
- Grouped map UDFs: Input is (keys, vals) where vals is pd.DataFrame, output is pd.DataFrame
Multi-UDF support: Aggregation UDFs support multiple UDFs in a single projection/aggregation, while grouped map UDFs do not.

Does this PR introduce any user-facing change?

No. This is an internal refactoring that does not change the public API or behavior. The serialization logic remains functionally equivalent.

How was this patch tested?

All existing tests continue to pass, and a new multi-UDF test (test_pandas_udf_window.py::WindowPandasUDFTests::test_multiple_udfs) was added.

Was this patch authored or co-authored using generative AI tooling?

No

zhengruifeng · 2025-11-27T02:43:24Z

remove [CORE] and [SQL] from the title, since it's not related to spark core or sql

Yicong-Huang · 2025-11-27T04:42:32Z

remove [CORE] and [SQL] from the title, since it's not related to spark core or sql

I added them according to the labels added by github actions. Are they accurate?

zhengruifeng · 2025-11-27T08:43:05Z

remove [CORE] and [SQL] from the title, since it's not related to spark core or sql

I added them according to the labels added by github actions. Are they accurate?

they are not very accurate

zhengruifeng · 2025-11-27T08:45:14Z

merged to master

Yicong-Huang added 3 commits November 26, 2025 11:38

feat: add ArrowStreamAggPandasUDFSerializer

373b7b5

test: add test for window pandas agg multi udf case

e02f6b3

fix: format

c2f1b65

github-actions bot added SQL CORE PYTHON labels Nov 26, 2025

dongjoon-hyun changed the title ~~[SPARK-54531] Introduce ArrowStreamAggPandasUDFSerializer~~ [SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer Nov 26, 2025

Yicong-Huang changed the title ~~[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer~~ [SPARK-54531][CORE][PYTHON][SQL] Introduce ArrowStreamAggPandasUDFSerializer Nov 26, 2025

zhengruifeng changed the title ~~[SPARK-54531][CORE][PYTHON][SQL] Introduce ArrowStreamAggPandasUDFSerializer~~ [SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer Nov 27, 2025

zhengruifeng approved these changes Nov 27, 2025

View reviewed changes

zhengruifeng closed this in 154a270 Nov 27, 2025

Yicong-Huang mentioned this pull request Dec 1, 2025

[WIP][SPARK-54316][CORE][PYTHON][SQL] Consolidate GroupPandasIterUDFSerializer with GroupPandasUDFSerializer #53043

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer #53239

[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer #53239

Uh oh!

Yicong-Huang commented Nov 26, 2025

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

Yicong-Huang commented Nov 27, 2025

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer #53239

[SPARK-54531][PYTHON] Introduce ArrowStreamAggPandasUDFSerializer #53239

Uh oh!

Conversation

Yicong-Huang commented Nov 26, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

Yicong-Huang commented Nov 27, 2025

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

zhengruifeng commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants