Skip to content

Commit 2162a69

Browse files
authored
feat: Optimize and fix filtering on toStartOfX primary key expressions (#1265)
Closes HDX-2576 Closes HDX-2491 # Summary It is a common optimization to have a primary key like `toStartOfDay(Timestamp), ..., Timestamp`. This PR improves the experience when using such a primary key in the following ways: 1. HyperDX will now automatically filter on both `toStartOfDay(Timestamp)` and `Timestamp` in this case, instead of just `Timestamp`. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting. 2. HyperDX now applies the same `toStartOfX` function to the right-hand-side of timestamp comparisons. So when filtering using an expression like `toStartOfDay(Timestamp)`, the generated SQL will have the condition `toStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>)`. This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute). With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering. ## Testing ### Testing the fix The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to `TimestampTime, toStartOfMinute(TimestampTime)`. Then, in the logs search, filter for a timespan < 1 minute. <details> <summary>Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter</summary> https://github.com/user-attachments/assets/915d3922-55f8-4742-b686-5090cdecef60 </details> <details> <summary>With the fix, you should see logs in the selected time range</summary> https://github.com/user-attachments/assets/f75648e4-3f48-47b0-949f-2409ce075a75 </details> ### Testing the optimization The optimization part of this change is that when a table has a primary key like `toStartOfMinute(TimestampTime), ..., TimestampTime` and the Timestamp Column for the source is just `Timestamp`, the query will automatically filter by both `toStartOfMinute(TimestampTime)` and `TimestampTime`. To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default `otel_logs` table into the new table (`INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs`). <details> <summary>DDL for log table with optimized key</summary> ```sql CREATE TABLE default.otel_logs_toStartOfMinute_Key ( `Timestamp` DateTime64(9) CODEC(Delta(8), ZSTD(1)), `TimestampTime` DateTime DEFAULT toDateTime(Timestamp), `TraceId` String CODEC(ZSTD(1)), `SpanId` String CODEC(ZSTD(1)), `TraceFlags` UInt8, `SeverityText` LowCardinality(String) CODEC(ZSTD(1)), `SeverityNumber` UInt8, `ServiceName` LowCardinality(String) CODEC(ZSTD(1)), `Body` String CODEC(ZSTD(1)), `ResourceSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)), `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)), `ScopeSchemaUrl` LowCardinality(String) CODEC(ZSTD(1)), `ScopeName` String CODEC(ZSTD(1)), `ScopeVersion` LowCardinality(String) CODEC(ZSTD(1)), `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)), `LogAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)), `__hdx_materialized_k8s.pod.name` String MATERIALIZED ResourceAttributes['k8s.pod.name'] CODEC(ZSTD(1)), INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1, INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1, INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8, INDEX idx_lower_body lower(Body) TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8 ) ENGINE = SharedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') PARTITION BY toDate(TimestampTime) PRIMARY KEY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime) ORDER BY (toStartOfMinute(TimestampTime), ServiceName, TimestampTime, Timestamp) TTL TimestampTime + toIntervalDay(90) SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 ``` </details> Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a `WHERE` predicate that filters on both `TimestampTime` and `toStartOfMinute(TimestampTime)`, despite `toStartOfMinute(TimestampTime)` not being included in the Timestamp Column of the source's configuration.
1 parent 8190ee8 commit 2162a69

File tree

5 files changed

+550
-17
lines changed

5 files changed

+550
-17
lines changed

.changeset/fluffy-mails-sparkle.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
"@hyperdx/common-utils": patch
3+
"@hyperdx/app": patch
4+
---
5+
6+
feat: Optimize and fix filtering on toStartOfX primary key expressions

packages/common-utils/src/__tests__/renderChartConfig.test.ts

Lines changed: 219 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
import { chSql, parameterizedQueryToSql } from '@/clickhouse';
1+
import { chSql, ColumnMeta, parameterizedQueryToSql } from '@/clickhouse';
22
import { Metadata } from '@/metadata';
33
import {
44
ChartConfigWithOptDateRange,
55
DisplayType,
66
MetricsDataType,
77
} from '@/types';
88

9-
import { renderChartConfig } from '../renderChartConfig';
9+
import { renderChartConfig, timeFilterExpr } from '../renderChartConfig';
1010

1111
describe('renderChartConfig', () => {
12-
let mockMetadata: Metadata;
12+
let mockMetadata: jest.Mocked<Metadata>;
1313

1414
beforeEach(() => {
1515
mockMetadata = {
@@ -19,7 +19,10 @@ describe('renderChartConfig', () => {
1919
]),
2020
getMaterializedColumnsLookupTable: jest.fn().mockResolvedValue(null),
2121
getColumn: jest.fn().mockResolvedValue({ type: 'DateTime' }),
22-
} as unknown as Metadata;
22+
getTableMetadata: jest
23+
.fn()
24+
.mockResolvedValue({ primary_key: 'timestamp' }),
25+
} as unknown as jest.Mocked<Metadata>;
2326
});
2427

2528
const gaugeConfiguration: ChartConfigWithOptDateRange = {
@@ -630,4 +633,216 @@ describe('renderChartConfig', () => {
630633
expect(actual).toMatchSnapshot();
631634
});
632635
});
636+
637+
describe('timeFilterExpr', () => {
638+
type TimeFilterExprTestCase = {
639+
timestampValueExpression: string;
640+
dateRangeStartInclusive?: boolean;
641+
dateRangeEndInclusive?: boolean;
642+
dateRange: [Date, Date];
643+
includedDataInterval?: string;
644+
expected: string;
645+
description: string;
646+
tableName?: string;
647+
databaseName?: string;
648+
primaryKey?: string;
649+
};
650+
651+
const testCases: TimeFilterExprTestCase[] = [
652+
{
653+
description: 'with basic timestampValueExpression',
654+
timestampValueExpression: 'timestamp',
655+
dateRange: [
656+
new Date('2025-02-12 00:12:34Z'),
657+
new Date('2025-02-14 00:12:34Z'),
658+
],
659+
expected: `(timestamp >= fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}) AND timestamp <= fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}))`,
660+
},
661+
{
662+
description: 'with dateRangeEndInclusive=false',
663+
timestampValueExpression: 'timestamp',
664+
dateRange: [
665+
new Date('2025-02-12 00:12:34Z'),
666+
new Date('2025-02-14 00:12:34Z'),
667+
],
668+
dateRangeEndInclusive: false,
669+
expected: `(timestamp >= fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}) AND timestamp < fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}))`,
670+
},
671+
{
672+
description: 'with dateRangeStartInclusive=false',
673+
timestampValueExpression: 'timestamp',
674+
dateRange: [
675+
new Date('2025-02-12 00:12:34Z'),
676+
new Date('2025-02-14 00:12:34Z'),
677+
],
678+
dateRangeStartInclusive: false,
679+
expected: `(timestamp > fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}) AND timestamp <= fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}))`,
680+
},
681+
{
682+
description: 'with includedDataInterval',
683+
timestampValueExpression: 'timestamp',
684+
dateRange: [
685+
new Date('2025-02-12 00:12:34Z'),
686+
new Date('2025-02-14 00:12:34Z'),
687+
],
688+
includedDataInterval: '1 WEEK',
689+
expected: `(timestamp >= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}), INTERVAL 1 WEEK) - INTERVAL 1 WEEK AND timestamp <= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}), INTERVAL 1 WEEK) + INTERVAL 1 WEEK)`,
690+
},
691+
{
692+
description: 'with date type timestampValueExpression',
693+
timestampValueExpression: 'date',
694+
dateRange: [
695+
new Date('2025-02-12 00:12:34Z'),
696+
new Date('2025-02-14 00:12:34Z'),
697+
],
698+
expected: `(date >= toDate(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()})) AND date <= toDate(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()})))`,
699+
},
700+
{
701+
description: 'with multiple timestampValueExpression parts',
702+
timestampValueExpression: 'timestamp, date',
703+
dateRange: [
704+
new Date('2025-02-12 00:12:34Z'),
705+
new Date('2025-02-14 00:12:34Z'),
706+
],
707+
expected: `(timestamp >= fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}) AND timestamp <= fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}))AND(date >= toDate(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()})) AND date <= toDate(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()})))`,
708+
},
709+
{
710+
description: 'with toStartOfDay() in timestampExpr',
711+
timestampValueExpression: 'toStartOfDay(timestamp)',
712+
dateRange: [
713+
new Date('2025-02-12 00:12:34Z'),
714+
new Date('2025-02-14 00:12:34Z'),
715+
],
716+
expected: `(toStartOfDay(timestamp) >= toStartOfDay(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()})) AND toStartOfDay(timestamp) <= toStartOfDay(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()})))`,
717+
},
718+
{
719+
description: 'with toStartOfDay () in timestampExpr',
720+
timestampValueExpression: 'toStartOfDay (timestamp)',
721+
dateRange: [
722+
new Date('2025-02-12 00:12:34Z'),
723+
new Date('2025-02-14 00:12:34Z'),
724+
],
725+
expected: `(toStartOfDay (timestamp) >= toStartOfDay(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()})) AND toStartOfDay (timestamp) <= toStartOfDay(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()})))`,
726+
},
727+
{
728+
description: 'with toStartOfInterval() in timestampExpr',
729+
timestampValueExpression:
730+
'toStartOfInterval(timestamp, INTERVAL 12 MINUTE)',
731+
dateRange: [
732+
new Date('2025-02-12 00:12:34Z'),
733+
new Date('2025-02-14 00:12:34Z'),
734+
],
735+
expected: `(toStartOfInterval(timestamp, INTERVAL 12 MINUTE) >= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}), INTERVAL 12 MINUTE) AND toStartOfInterval(timestamp, INTERVAL 12 MINUTE) <= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}), INTERVAL 12 MINUTE))`,
736+
},
737+
{
738+
description:
739+
'with toStartOfInterval() with lowercase interval in timestampExpr',
740+
timestampValueExpression:
741+
'toStartOfInterval(timestamp, interval 1 minute)',
742+
dateRange: [
743+
new Date('2025-02-12 00:12:34Z'),
744+
new Date('2025-02-14 00:12:34Z'),
745+
],
746+
expected: `(toStartOfInterval(timestamp, interval 1 minute) >= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}), interval 1 minute) AND toStartOfInterval(timestamp, interval 1 minute) <= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}), interval 1 minute))`,
747+
},
748+
{
749+
description: 'with toStartOfInterval() with timezone and offset',
750+
timestampValueExpression: `toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York')`,
751+
dateRange: [
752+
new Date('2025-02-12 00:12:34Z'),
753+
new Date('2025-02-14 00:12:34Z'),
754+
],
755+
expected: `(toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') >= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') AND toStartOfInterval(timestamp, INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York') <= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}), INTERVAL 1 MINUTE, toDateTime('2023-01-01 14:35:30'), 'America/New_York'))`,
756+
},
757+
{
758+
description: 'with nonstandard spacing',
759+
timestampValueExpression: ` toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) `,
760+
dateRange: [
761+
new Date('2025-02-12 00:12:34Z'),
762+
new Date('2025-02-14 00:12:34Z'),
763+
],
764+
expected: `(toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) >= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-12 00:12:34Z').getTime()}), INTERVAL 1 MINUTE, toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York') AND toStartOfInterval ( timestamp , INTERVAL 1 MINUTE , toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York' ) <= toStartOfInterval(fromUnixTimestamp64Milli(${new Date('2025-02-14 00:12:34Z').getTime()}), INTERVAL 1 MINUTE, toDateTime ( '2023-01-01 14:35:30' ), 'America/New_York'))`,
765+
},
766+
{
767+
description: 'with optimizable timestampValueExpression',
768+
timestampValueExpression: `timestamp`,
769+
primaryKey:
770+
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp",
771+
dateRange: [
772+
new Date('2025-02-12 00:12:34Z'),
773+
new Date('2025-02-14 00:12:34Z'),
774+
],
775+
expected: `(timestamp >= fromUnixTimestamp64Milli(1739319154000) AND timestamp <= fromUnixTimestamp64Milli(1739491954000))AND(toStartOfMinute(timestamp) >= toStartOfMinute(fromUnixTimestamp64Milli(1739319154000)) AND toStartOfMinute(timestamp) <= toStartOfMinute(fromUnixTimestamp64Milli(1739491954000)))`,
776+
},
777+
{
778+
description: 'with synthetic timestamp value expression for CTE',
779+
timestampValueExpression: `__hdx_time_bucket`,
780+
dateRange: [
781+
new Date('2025-02-12 00:12:34Z'),
782+
new Date('2025-02-14 00:12:34Z'),
783+
],
784+
databaseName: '',
785+
tableName: 'Bucketed',
786+
primaryKey:
787+
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp",
788+
expected: `(__hdx_time_bucket >= fromUnixTimestamp64Milli(1739319154000) AND __hdx_time_bucket <= fromUnixTimestamp64Milli(1739491954000))`,
789+
},
790+
791+
{
792+
description: 'with toStartOfMinute in timestampValueExpression',
793+
timestampValueExpression: `toStartOfMinute(timestamp)`,
794+
dateRange: [
795+
new Date('2025-02-12 00:12:34Z'),
796+
new Date('2025-02-14 00:12:34Z'),
797+
],
798+
primaryKey:
799+
"toStartOfMinute(timestamp), ServiceName, ResourceAttributes['timestamp'], timestamp",
800+
expected: `(toStartOfMinute(timestamp) >= toStartOfMinute(fromUnixTimestamp64Milli(1739319154000)) AND toStartOfMinute(timestamp) <= toStartOfMinute(fromUnixTimestamp64Milli(1739491954000)))`,
801+
},
802+
];
803+
804+
beforeEach(() => {
805+
mockMetadata.getColumn.mockImplementation(async ({ column }) =>
806+
column === 'date'
807+
? ({ type: 'Date' } as ColumnMeta)
808+
: ({ type: 'DateTime' } as ColumnMeta),
809+
);
810+
});
811+
812+
it.each(testCases)(
813+
'should generate a time filter expression $description',
814+
async ({
815+
timestampValueExpression,
816+
dateRangeEndInclusive = true,
817+
dateRangeStartInclusive = true,
818+
dateRange,
819+
expected,
820+
includedDataInterval,
821+
tableName = 'target_table',
822+
databaseName = 'default',
823+
primaryKey,
824+
}) => {
825+
if (primaryKey) {
826+
mockMetadata.getTableMetadata.mockResolvedValue({
827+
primary_key: primaryKey,
828+
} as any);
829+
}
830+
831+
const actual = await timeFilterExpr({
832+
timestampValueExpression,
833+
dateRangeEndInclusive,
834+
dateRangeStartInclusive,
835+
dateRange,
836+
connectionId: 'test-connection',
837+
databaseName,
838+
tableName,
839+
metadata: mockMetadata,
840+
includedDataInterval,
841+
});
842+
843+
const actualSql = parameterizedQueryToSql(actual);
844+
expect(actualSql).toBe(expected);
845+
},
846+
);
847+
});
633848
});

0 commit comments

Comments
 (0)