Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Oct 20, 2025

Description

Fixed bin command failing on nested fields (e.g., resource.attributes.telemetry.sdk.version).
Updated CalciteRelNodeVisitor.projectPlusOverriding() to use prefix matching instead of exact matching for nested field names.

Related Issues

originalFieldNames.stream()
.filter(newNames::contains)
.filter(
originalName ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's extract for readability / maintainability.

Copy link
Contributor Author

@ahkcs ahkcs Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted

Comment on lines 858 to 862
// Match exact field names (e.g., "age" == "age")
// OR nested paths (e.g., "resource.attributes..." starts with
// "resource")
newName.equals(originalName)
|| newName.startsWith(originalName + ".")))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean it could override all the original fields start with originalName + "."? (like when original fields include resource.a, resource.b, etc. then all of them will be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question! No - originalFieldNames only contains top-level field names like "resource", not nested paths like "resource.a" or "resource.b".

So when binning resource.c.nested, the filter only matches the top-level "resource" field (via startsWith("resource.")). This causes the entire resource struct to be removed and flattened, with the binned field replacing the original nested value.

Only one top-level struct gets matched and processed, regardless of how many nested fields it contains.

You can refer to this IT test I added: testBinWithNestedFieldWithoutExplicitProjection

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field name could contain . and newName.startsWith("resource.") could match with multiple fields in my understanding.
I am unsure how much we currently support field name containing ., but QualifiedNameResolver implements longest match logic to decide the referred field in case qualified name contains multiple dots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be an unusual case for field name to contain mutliple .?
Looking at the end-to-end flow, originalFieldNames comes from:
context.relBuilder.peek().getRowType().getFieldNames()

This returns only the direct fields of the current RelNode's row type. For a schema with nested structures, the row type would have:

  • Top-level field: "resource" (type: STRUCT)
  • Top-level field: "severityNumber" (type: INT)

It would not contain "resource.a", "resource.b", etc. as separate top-level fields - those only exist as nested fields within the resource struct definition.

The scenario where multiple top-level fields share a prefix (like resource, resource.a, resource.b all being top-level) would require an unusual schema design(I think we can rarely meet this?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSearch mapping key could include ., and also we can easily introduce field which contains . with eval command.
Why can't we directly match with the actual field we used?
projectPlusOverriding is used for multiple commands, and could have side effect to other commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we directly match with the actual field we used?

We DO directly match. The logic uses both:

  1. Exact match: newName.equals(originalName)
  2. Struct match: newName.startsWith(originalName + ".")

This exact logic is already used by eval. Example:
source=telemetry | eval resource.temp = 1 | bin resource.attributes.telemetry.sdk.version span=2

Step 1 - Eval:

  • "resource.temp".startsWith("resource.") → true → flattens struct
  • Schema becomes: ["resource.attributes.telemetry.sdk.enabled", "resource.attributes.telemetry.sdk.language", ..., "resource.temp"]

Step 2 - Bin:

  • Schema already flattened, exact match works: "resource.attributes.telemetry.sdk.version".equals("resource.attributes.telemetry.sdk.version") → true

Both commands use the same logic safely. Added test (testBinWithEvalCreatedDottedFieldName) to confirm no edge case issues

Copy link
Collaborator

@yuancu yuancu Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @ykmr1224 means if we can directly know the sub field of a struct and project it, instead of relying on the crispy name matching. E.g. if a struct is info {a: string, b: int}, we can directly project info, info.a, info.b instead of relying on pattern info.* to match the names, because there may be another schema info: string, info.a integer, where info.a is not a sub-field of info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuancu Thanks for the clarification! However, I think the current logic handles this situation, you can refer to this newly added integration test using eval command to test: testBinWithEvalCreatedDottedFieldName

For the edge case you mentioned, I think it would be problematic if we have two info.a with different type because that would cause confusion for the commands. For example, when we are using stats count(info.a), which info.a is it referring to? For other cases where there's no naming overlap, I think you can refer to the eval test case

newName.equals(originalName)
// OR match nested paths (e.g., "resource.attributes..." starts with
// "resource.")
|| newName.startsWith(originalName + "."));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any other place using this pattern of detection? Or if there is a better way to detect if it is a nested field or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CalciteRelNodeVisitor.java, the visitFlatten() method uses the same startsWith() pattern:

  public RelNode visitFlatten(Flatten node, CalcitePlanContext context) {
    visitChildren(node, context);
    RelBuilder relBuilder = context.relBuilder;
    String fieldName = node.getField().getField().toString();
    // Match the sub-field names with "field.*"
    List<RelDataTypeField> fieldsToExpand =
        relBuilder.peek().getRowType().getFieldList().stream()
            .filter(f -> f.getName().startsWith(fieldName + "."))  // ← Same pattern!
            .toList();
    // ... rest of method
  }

@ahkcs ahkcs requested review from RyanL1997 and ykmr1224 October 21, 2025 23:07
ahkcs added 3 commits October 21, 2025 17:38
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@yuancu yuancu added the bug Something isn't working label Oct 22, 2025
@RyanL1997 RyanL1997 added bugFix PPL Piped processing language backport 2.19-dev labels Oct 24, 2025
@RyanL1997 RyanL1997 merged commit 85dc8d9 into opensearch-project:main Oct 24, 2025
40 of 41 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 24, 2025
(cherry picked from commit 85dc8d9)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
LantaoJin pushed a commit that referenced this pull request Oct 26, 2025
(cherry picked from commit 85dc8d9)

Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

# Conflicts:
#	docs/user/ppl/functions/conversion.rst
asifabashar added a commit to asifabashar/sql that referenced this pull request Oct 28, 2025
* default-main: (34 commits)
  Enhance dynamic source clause to support only metadata filters (opensearch-project#4554)
  Make nested alias type support referring to outer context (opensearch-project#4673)
  Update big5 ppl queries and check plans (opensearch-project#4668)
  Support push down sort after limit (opensearch-project#4657)
  Use table scan rowType in filter pushdown could fix rename issue (opensearch-project#4670)
  Fix: Support Alias Fields in MIN, MAX, FIRST, LAST, and TAKE Aggregations (opensearch-project#4621)
  Fix bin nested fields issue (opensearch-project#4606)
  Add `per_minute`, `per_hour`, `per_day` function support (opensearch-project#4531)
  Pushdown sort aggregate metrics (opensearch-project#4603)
  Followup: Change ComparableLinkedHashMap to compare Key than Value (opensearch-project#4648)
  Mitigate the CI failure caused by 500 Internal Server Error (opensearch-project#4646)
  Allow renaming group-by fields to existing field names (opensearch-project#4586)
  Publish internal modules separately for downstream reuse (opensearch-project#4484)
  Revert "Update grammar files and developer guide (opensearch-project#4301)" (opensearch-project#4643)
  Support Automatic Type Conversion for REX/SPATH/PARSE Command Extractions (opensearch-project#4599)
  Replace all dots in fields of table scan's PhysType (opensearch-project#4633)
  Return comparable LinkedHashMap in `valueForCalcite()` of ExprTupleValue (opensearch-project#4629)
  Refactor JsonExtractAllFunctionIT and MapConcatFunctionIT (opensearch-project#4623)
  Pushdown case function in aggregations as range queries (opensearch-project#4400)
  Update GEOIP function to support IP types as input (opensearch-project#4613)
  ...

Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev bug Something isn't working bugFix PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] PPL bin command fails to produce output for nested/struct fields

4 participants