Skip to content

Conversation

LilyCaroline17
Copy link
Contributor

@LilyCaroline17 LilyCaroline17 commented Sep 23, 2025

Description

This PR addresses some bugs that we're having with how the source field is stored in the local index. The source field can be really long and/or really nested. The solution is to store the source field as a string within the local index and in-memory so no exceptions related to field depth or number of fields is triggered and to maintain consistency.

For the top_queries index template that contains the local index mapping, the template is updated daily during index creation. When changing the index mapping, upgrades should not be affected because reindex automatically upgrades data format from object to string.

Additionally, for strings that are too long, they are truncated and a boolean to mark that the string has been truncated is set. These source strings are then skipped when parsing.

Issues Resolved

Closes [/issues/352] and [/issues/411]
Dashboards PR to handle source field change: opensearch-project/query-insights-dashboards#357

Check that Local index properly created:

Screenshot 2025-09-29 at 11 32 01 AM

Scrolling down we can see that source is in properties in the local index.

Screenshot 2025-09-29 at 11 39 43 AM

Test Results (1000 query records)

I performed a test with 1000 query records to compare the storage usage of storing source as a string compared to storing source as a SearchSourceBuilder and got the following results:

Query Complexity SearchSourceBuilder String Storage Memory Saved
Small queries 320B/record 312B/record 2.6%
Average queries 1.8KB/record 751B/record 58.4%
Large queries 4.0KB/record 2.0KB/record 49.3%
Very large (truncated) 10.3KB/record 4.4KB/record 57.4%

We can see that there is a significant reduction in memory storage for source when it's stored as string, especially in larger, more complex queries.

However, it's important to note that source was stored as a SearchSourceBuilder for a reason, and after it's stored into local index, it's needed as a SearchSourceBuilder only in SearchQueryCategorizer, which only affects deployments with search query metrics enabled. This extra cost would be 4-15 ms per query parsing overhead.

The memory saving benefits and stability (avoiding failures from attempts to store source as a SearchSourceBuilder) take precedence over the extra time it takes to parse queries in certain conditions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dzane17
Copy link
Member

dzane17 commented Sep 23, 2025

Does this actually change the top_queries index mapping for source field? I though we would need to set it explicitly in https://github.com/opensearch-project/query-insights/blob/main/src/main/resources/mappings/top-queries-record.json

@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch from 241dd3e to 85c9170 Compare September 23, 2025 18:56
@dzane17 dzane17 added the v3.3.0 Issues targeting release v3.3.0 label Sep 23, 2025
@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch 3 times, most recently from 00fa3ca to 3a576e5 Compare September 24, 2025 22:59
@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch 6 times, most recently from c736543 to d8d9eff Compare September 26, 2025 22:40
@ansjcy ansjcy removed the v3.3.0 Issues targeting release v3.3.0 label Oct 1, 2025
@LilyCaroline17 LilyCaroline17 changed the title Optimize query storage by storing source field as a string in local index Optimize query storage by storing source field as a string in local index (DRAFT) Oct 7, 2025
@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch 5 times, most recently from db58a0e to b1a7366 Compare October 9, 2025 06:26
@LilyCaroline17 LilyCaroline17 changed the title Optimize query storage by storing source field as a string in local index (DRAFT) Optimize query storage by storing source field as a string in local index Oct 9, 2025
@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 45.79439% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.05%. Comparing base (0e7b944) to head (b1a7366).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...plugin/insights/rules/model/SearchQueryRecord.java 47.72% 18 Missing and 5 partials ⚠️
...gin/insights/core/exporter/LocalIndexExporter.java 22.72% 13 Missing and 4 partials ⚠️
...re/service/categorizer/SearchQueryCategorizer.java 54.54% 7 Missing and 3 partials ⚠️
.../insights/core/listener/QueryInsightsListener.java 40.00% 4 Missing and 2 partials ⚠️
...nsearch/plugin/insights/rules/model/Attribute.java 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #424      +/-   ##
============================================
- Coverage     78.37%   77.05%   -1.33%     
- Complexity      613      622       +9     
============================================
  Files            52       52              
  Lines          2340     2432      +92     
  Branches        225      252      +27     
============================================
+ Hits           1834     1874      +40     
- Misses          383      423      +40     
- Partials        123      135      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch 2 times, most recently from b1a7366 to e70c4cc Compare October 9, 2025 06:46
@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch from e70c4cc to c8d8276 Compare October 9, 2025 06:50
@ansjcy
Copy link
Member

ansjcy commented Oct 17, 2025

We should do more tests especially for compatibilities in this PR, e.g.

  • Run a cluster with existing top queries indices that have the old mapping
  • enable export for different settings on the MAX_SOURCE_LENGTH, until truncating happens.
  • You can include test results for retrieving top queries from local index for those different scenarios in the PR description.

@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch 4 times, most recently from 4b92419 to 0464372 Compare October 22, 2025 04:02
…ndex

Signed-off-by: Emily Guo <35637792+LilyCaroline17@users.noreply.github.com>
@LilyCaroline17 LilyCaroline17 force-pushed the optimizeExporterQueryStorage branch from 0464372 to 17e0ada Compare October 22, 2025 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants