[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

szehon-ho · 2025-11-26T07:05:19Z

What changes were proposed in this pull request?

#52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct.

This makes this feature off by default, and turned on via a config.

Why are the changes needed?

The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified.

I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion.

Does this PR introduce any user-facing change?

No this feature is unreleased

How was this patch tested?

Existing unit test

Was this patch authored or co-authored using generative AI tooling?

No

cloud-fan · 2025-11-26T07:25:03Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

-  def coerceMergeNestedTypes: Boolean =
-    getConf(SQLConf.MERGE_INTO_NESTED_TYPE_COERCION_ENABLED)
+  // Disable until we define the semantics of UPDATE SET * with nested types
+  def coerceMergeNestedTypes: Boolean = false


shall we just turn off the config by default without removing tests?

sure, that'd be great

corleyma · 2025-11-26T18:57:30Z

Is it possible to make this a behavior that folks can opt into while the semantics are being sorted out? I would find the new behavior very useful, and it's a bit sad to leave it disabled when it's implemented. The gymnastics I do to handle this today isn't fun.

szehon-ho · 2025-11-26T19:13:21Z

sure, as per @cloud-fan 's comment we will disable by config. I will mark config as experimental and know the semantics for nested field assignment may change in future release, if they are not matching in schema

dongjoon-hyun

Thank you, @szehon-ho . Looking forward to seeing the final status of this PR.

BTW, when you make a PR for master branch, it affects Apache Spark 4.2.0 too. So, please remove this wording from this PR.

This change disable it for Spark 4.1.

This reverts commit 96fca0e.

szehon-ho · 2025-11-26T20:17:00Z

Updated pr description and pr.

Because now I have a config to enable it, I reverted the more controversial pr #53149 manually, it should revert to the original simpler behavior of replacing structs at column level for UPDATE SET *.

Thanks!

szehon-ho · 2025-11-26T20:18:01Z

sql/core/src/test/scala/org/apache/spark/sql/connector/MergeIntoTableSuiteBase.scala

-             |s STRUCT<c1: INT, c2: STRUCT<a: ARRAY<INT>, m: MAP<STRING, STRING>>>,
-             |dep STRING""".stripMargin,
-          """{ "pk": 1, "s": { "c1": 2, "c2": { "a": [1,2], "m": { "a": "b" } } }, "dep": "hr" }""")
+      Seq(true, false).foreach { coerceNestedTypes =>


FYI: most of these changes is because coerceNestedTypes was true by default, now its false, so i add another dimension to these tests

dongjoon-hyun

+1 for the direction. Let's see the CI result first. Thanks.

cloud-fan · 2025-11-27T06:23:46Z

to share some initial thoughts:

The behavior of SET target.struct_col = source.struct_col is quite intuitive: assignment means we fully assign the right side value to the left side variable. After schema evolution or coercion, these two structs should have the same schema, and we simply write the source struct to the table, no matter it's null or it has null fields.
If we don't want this "full assignment" behavior, but retain the original values of some target struct fields that are not present in the source struct, we can follow SET * and support this syntax: SET target.struct_col.* = source.struct_col.*.
Let's say if the struct is deeply nested, and we want to control how deep we expand the fields during assignment, we can make the syntax more flexible. For example, SET target.struct_col.*.* = source.struct_col.*.* means expand two levels, .** or .*..* means expands all levels until hit the leaf field.
Table row is similar to a struct, so the syntax should be consistent. e.g. UPDATE SET * only expand one level, which means assign top-level columns from source to target. UPDATE SET *.* means two levels and UPDATE SET ** means all levels. For full assignment, it's like UPDATE SET target_table = source_table, but we likely don't want it as it's very weird.
In reality, we likely don't need to be so flexible, having * and ** should be good enough for expanding one level and all levels.

@szehon-ho let's keep the existing code if they may still be useful to implement the new syntaxes.

dongjoon-hyun · 2025-11-28T21:52:54Z

Could you address the above comment, @szehon-ho ?

dongjoon-hyun · 2025-11-29T21:02:21Z

To @szehon-ho and @cloud-fan , let me merge this AS-IS status to unblock Apache Spark 4.1.0.

For the last comment, we can add back cleanly to master branch for Spark 4.2.0 later.

let's keep the existing code if they may still be useful to implement the new syntaxes.

Thank you.

… a config ### What changes were proposed in this pull request? #52225 allow MERGE INTO to support case where assignment value is a struct with less fields than the assignment key, ie UPDATE SET big_struct = source.small_struct. This makes this feature off by default, and turned on via a config. ### Why are the changes needed? The change brought some interesting question, for example there is some ambiguity in user intent. Does the UPDATE SET * mean set all nested fields or top level columns? In the first case, missing fields are kept. In the second case, missing fields are nullified. I tried to make a choice in #53149 but after some feedback, it may be a bit controversial, choosing one interpretation over another. A SQLConf may not be the right choice, and instead we may need to introduce some new syntax, which require more discussion. ### Does this PR introduce _any_ user-facing change? No this feature is unreleased ### How was this patch tested? Existing unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #53229 from szehon-ho/disable_merge_update_source_coercion. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 23d9253) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO

96fca0e

github-actions bot added the SQL label Nov 26, 2025

cloud-fan reviewed Nov 26, 2025

View reviewed changes

dongjoon-hyun reviewed Nov 26, 2025

View reviewed changes

szehon-ho added 2 commits November 26, 2025 12:08

Set the flag to false and preserve unit tests

ac75170

This reverts commit 96fca0e.

Revert apache#53149

fe4e6cd

szehon-ho commented Nov 26, 2025

View reviewed changes

dongjoon-hyun self-assigned this Nov 26, 2025

szehon-ho changed the title ~~[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO~~ [SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config Nov 26, 2025

dongjoon-hyun reviewed Nov 26, 2025

View reviewed changes

dongjoon-hyun closed this in 23d9253 Nov 29, 2025

dongjoon-hyun removed their assignment Nov 29, 2025

manuzhang mentioned this pull request Dec 1, 2025

Spark: Add support for Spark 4.1 apache/iceberg#14155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

szehon-ho commented Nov 26, 2025 •

edited

Loading

Uh oh!

cloud-fan Nov 26, 2025

Uh oh!

szehon-ho Nov 26, 2025

Uh oh!

corleyma commented Nov 26, 2025

Uh oh!

szehon-ho commented Nov 26, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment •

edited

Loading

Uh oh!

szehon-ho commented Nov 26, 2025

Uh oh!

szehon-ho Nov 26, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

cloud-fan commented Nov 27, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun commented Nov 28, 2025

Uh oh!

dongjoon-hyun commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

[SPARK-54525][SQL] Disable nested struct coercion in MERGE INTO under a config #53229

Conversation

szehon-ho commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

szehon-ho Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

corleyma commented Nov 26, 2025

Uh oh!

szehon-ho commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szehon-ho commented Nov 26, 2025

Uh oh!

szehon-ho Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Nov 28, 2025

Uh oh!

dongjoon-hyun commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

szehon-ho commented Nov 26, 2025 •

edited

Loading

szehon-ho commented Nov 26, 2025 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

szehon-ho Nov 26, 2025 •

edited

Loading

cloud-fan commented Nov 27, 2025 •

edited

Loading