Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Oct 10, 2025

Description

  1. Type Conflict Handling
  • Changed behavior: Type conflicts now throw IllegalArgumentException instead of auto-renaming fields
  • Modified: SchemaUnifier.java - Removed automatic field renaming logic (e.g., age → age0)
  • Updated Tests:
    • CalciteMultisearchCommandIT.testMultisearchWithDirectTypeConflict - Now expects exception
    • CalcitePPLAppendCommandIT.testAppendWithConflictTypeColumn - Now expects exception
  • Documentation: Updated multisearch.rst and append.rst to reflect new behavior and add Limitations section
  1. Timestamp Interleaving
  • Modified: CalciteRelNodeVisitor.findTimestampField() - Now only detects @timestamp field for timestamp interleaving
  • Other timestamp fields (_time, timestamp, time) are no longer used for interleaving
  1. Documentation Fixes
  • multisearch.rst:
    • Added Limitations section
    • Removed redundant Example 4 (Handling Empty Results)
    • Removed Example 6 (type-conflict resolution)
  • append.rst:
    • Added Limitations section
    • Removed Example 3 (type conflict example no longer valid)

Related PRs

#4332 #4123

@ahkcs ahkcs changed the title Fixes for Multisearch and Append command Fixes for Multisearch and Append command Oct 10, 2025
@ahkcs ahkcs marked this pull request as ready for review October 10, 2025 21:28
Comment on lines 92 to 97
if (!typesForName.contains(fieldType)) {
// New field or same name with different type - add to schema
schema.add(new SchemaField(fieldName, fieldType));
typesForName.add(fieldType);
}
// If we've seen this exact (name, type) combination, skip it
Copy link
Collaborator

@ykmr1224 ykmr1224 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather raise error from here. (Then we don't need to check again)
Later we would implement a logic to generalize types for the same field name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I think it makes sense, moved the location to raise error

@ahkcs ahkcs requested a review from ykmr1224 October 10, 2025 22:19

if (!typesForName.contains(fieldType)) {
// New field or same name with different type - add to schema
RelDataType existingType = seenFields.get(fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently find another issue of type conflicts here. RelDataType evaluates the hash equality by its digested string as well. For example, "INTEGER" is not equal to "INTEGER NOT NULL". A quick fix would be aligning the same SqlType to be nullable. Ideally it won't affect the data type resolution while execution. cc @xinyual

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider to allow same SqlTypeName but with different nullability to be merged here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have updated the implementation to allow same SqlTypeName but with different nullability to be merged

Limitations
===========

* **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand the intention here. Strong schema engine like SQL restricts the type to be the same. Some weak schema engine resolves types at runtime and doesn't care the data type. I think it's not easy to make it compatible.

Not sure what's better user experience and customer expectation here. Does user accept this behavior or expect to union anyway? cc @LantaoJin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are planning to enable permissive mode in the future: #4349 to support schema merging with type conflicts, in order to avoid breaking changes in the future, we are marking this as a limitation now instead of using a workaround. cc @penghuo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the change.

ykmr1224
ykmr1224 previously approved these changes Oct 15, 2025
ahkcs added 5 commits October 28, 2025 10:30
Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	docs/category.json
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@Swiddis Swiddis added the bug Something isn't working label Oct 28, 2025
* @param rowType The row type to search for timestamp fields
* @return The name of the timestamp field, or null if not found
* @param rowType The row type to search for @timestamp field
* @return "@timestamp" if the field exists, or null if not found
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (non-blocking): Should we think about indices with different timestamp field names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is set as a limitation: we want to only support @timestamp

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs requested a review from ykmr1224 October 28, 2025 18:40
@ykmr1224 ykmr1224 added PPL Piped processing language calcite calcite migration releated backport 2.19-dev labels Oct 28, 2025
@ykmr1224 ykmr1224 merged commit 0dd5949 into opensearch-project:main Oct 28, 2025
39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 28, 2025
* Fix for Multisearch and Append command

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	docs/category.json

* fix tests

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix test

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove error location

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Allow same SqlTypeName but with different nullability to be merged

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Update error message

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
(cherry picked from commit 0dd5949)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@ahkcs ahkcs mentioned this pull request Oct 28, 2025
8 tasks
ykmr1224 pushed a commit that referenced this pull request Oct 28, 2025
* Fix for Multisearch and Append command



# Conflicts:
#	docs/category.json

* fix tests



* fix test



* remove error location



* Allow same SqlTypeName but with different nullability to be merged



* Update error message



---------


(cherry picked from commit 0dd5949)

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
return schema;
}

private static boolean areTypesCompatible(RelDataType type1, RelDataType type2) {
Copy link
Contributor

@songkant-aws songkant-aws Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another concern of using this method to allow type merge. If index A's RelDataType 'INTEGER NOT NULL' is put to unified schema, index B's same name RelDataType 'INTEGER' will be merged silently. Index B's column values could contain NULL values.

The generated code could ignore the null check because the merged unified schema has 'iNTEGER NOT NULL'. It will probably throw NPE when merging index B's NULL values. We could write some query to double check if we can reproduce this scenario.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried some queries in my local test. Haven't seen such NPE errors yet. Not sure if there is edge case. But for now we can leave it there until needed fix.

@ahkcs ahkcs deleted the multisearch_fix branch October 30, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev bug Something isn't working calcite calcite migration releated PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants