-
Notifications
You must be signed in to change notification settings - Fork 177
Fixes for Multisearch and Append command
#4512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Multisearch and Append command
f641eda to
b9332ed
Compare
| if (!typesForName.contains(fieldType)) { | ||
| // New field or same name with different type - add to schema | ||
| schema.add(new SchemaField(fieldName, fieldType)); | ||
| typesForName.add(fieldType); | ||
| } | ||
| // If we've seen this exact (name, type) combination, skip it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rather raise error from here. (Then we don't need to check again)
Later we would implement a logic to generalize types for the same field name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I think it makes sense, moved the location to raise error
|
|
||
| if (!typesForName.contains(fieldType)) { | ||
| // New field or same name with different type - add to schema | ||
| RelDataType existingType = seenFields.get(fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently find another issue of type conflicts here. RelDataType evaluates the hash equality by its digested string as well. For example, "INTEGER" is not equal to "INTEGER NOT NULL". A quick fix would be aligning the same SqlType to be nullable. Ideally it won't affect the data type resolution while execution. cc @xinyual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider to allow same SqlTypeName but with different nullability to be merged here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I have updated the implementation to allow same SqlTypeName but with different nullability to be merged
| Limitations | ||
| =========== | ||
|
|
||
| * **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understand the intention here. Strong schema engine like SQL restricts the type to be the same. Some weak schema engine resolves types at runtime and doesn't care the data type. I think it's not easy to make it compatible.
Not sure what's better user experience and customer expectation here. Does user accept this behavior or expect to union anyway? cc @LantaoJin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks for the change.
Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # docs/category.json
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
98e40bc to
ae5dbb1
Compare
| * @param rowType The row type to search for timestamp fields | ||
| * @return The name of the timestamp field, or null if not found | ||
| * @param rowType The row type to search for @timestamp field | ||
| * @return "@timestamp" if the field exists, or null if not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (non-blocking): Should we think about indices with different timestamp field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it is set as a limitation: we want to only support @timestamp
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteMultisearchCommandIT.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Kai Huang <ahkcs@amazon.com>
* Fix for Multisearch and Append command Signed-off-by: Kai Huang <ahkcs@amazon.com> # Conflicts: # docs/category.json * fix tests Signed-off-by: Kai Huang <ahkcs@amazon.com> * fix test Signed-off-by: Kai Huang <ahkcs@amazon.com> * remove error location Signed-off-by: Kai Huang <ahkcs@amazon.com> * Allow same SqlTypeName but with different nullability to be merged Signed-off-by: Kai Huang <ahkcs@amazon.com> * Update error message Signed-off-by: Kai Huang <ahkcs@amazon.com> --------- Signed-off-by: Kai Huang <ahkcs@amazon.com> (cherry picked from commit 0dd5949) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix for Multisearch and Append command # Conflicts: # docs/category.json * fix tests * fix test * remove error location * Allow same SqlTypeName but with different nullability to be merged * Update error message --------- (cherry picked from commit 0dd5949) Signed-off-by: Kai Huang <ahkcs@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
| return schema; | ||
| } | ||
|
|
||
| private static boolean areTypesCompatible(RelDataType type1, RelDataType type2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another concern of using this method to allow type merge. If index A's RelDataType 'INTEGER NOT NULL' is put to unified schema, index B's same name RelDataType 'INTEGER' will be merged silently. Index B's column values could contain NULL values.
The generated code could ignore the null check because the merged unified schema has 'iNTEGER NOT NULL'. It will probably throw NPE when merging index B's NULL values. We could write some query to double check if we can reproduce this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried some queries in my local test. Haven't seen such NPE errors yet. Not sure if there is edge case. But for now we can leave it there until needed fix.
Description
Related PRs
#4332 #4123