Skip to content

Conversation

@zhishengyk
Copy link

Add hamming_distance function to calculate the Hamming distance between two strings.

Changes

BE: Implement hamming_distance in function_string.cpp with FunctionBinaryToType + HammingDistanceImpl,
and raise an error when the two input strings have different lengths instead of returning NULL.
FE: Add HammingDistance scalar function in Nereids with AlwaysNullable (returns NULL when any input is NULL).
Test: Add BE-UT with check_function_all_arg_comb to cover all argument combinations.
Test: Add distributed regression test test_hamming_distance.groovy.
Doc: [link to your doc PR in apache/doris-website].
Behavior

Return type: BIGINT, the number of positions where corresponding characters differ.
Returns NULL if any input is NULL.
Throws an error if the two strings have different lengths.
Works for vector/vector, scalar/vector, vector/scalar, scalar/scalar combinations.
Testing

BE-UT: ./run-be-ut.sh (pass).
Regression: ./run-regression-test.sh --run test_hamming_distance (pass).

@zhishengyk zhishengyk requested a review from zclllyybb as a code owner January 5, 2026 11:25
@Thearas
Copy link
Contributor

Thearas commented Jan 5, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhishengyk
Copy link
Author

run buildall

zhishengyk and others added 2 commits January 6, 2026 11:45
This commit introduces the hamming_distance scalar function.

Includes:
- BE implementation and unit tests.
- FE function definition and visitor logic.
- End-to-end regression tests.
@zhishengyk
Copy link
Author

run buildall

@zhishengyk
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31837 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3ab7447bcccdda0dacc5a78123aed1bacd7a832f, data reload: false

------ Round 1 ----------------------------------
q1	17631	4239	4029	4029
q2	2014	360	246	246
q3	10172	1254	704	704
q4	10208	832	314	314
q5	7511	2087	1811	1811
q6	184	168	135	135
q7	964	782	652	652
q8	9276	1426	1152	1152
q9	4800	4516	4469	4469
q10	6748	1799	1400	1400
q11	487	304	287	287
q12	701	735	563	563
q13	17764	3826	3073	3073
q14	290	290	278	278
q15	562	510	508	508
q16	676	656	626	626
q17	662	834	469	469
q18	6456	6624	6727	6624
q19	969	1017	630	630
q20	450	369	266	266
q21	3180	2557	2593	2557
q22	1134	1107	1044	1044
Total cold run time: 102839 ms
Total hot run time: 31837 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4307	4401	4190	4190
q2	334	402	355	355
q3	2201	2765	2493	2493
q4	1403	1916	1346	1346
q5	4407	4326	4516	4326
q6	215	173	130	130
q7	1945	1944	1786	1786
q8	2554	2377	2603	2377
q9	7150	6937	7199	6937
q10	2473	2776	2208	2208
q11	545	473	460	460
q12	740	738	599	599
q13	3563	4016	3369	3369
q14	294	304	281	281
q15	528	487	474	474
q16	604	658	604	604
q17	1060	1234	1258	1234
q18	7665	7375	7183	7183
q19	784	770	782	770
q20	1853	1959	1801	1801
q21	4391	4287	4027	4027
q22	1110	1027	965	965
Total cold run time: 50126 ms
Total hot run time: 47915 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172259 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3ab7447bcccdda0dacc5a78123aed1bacd7a832f, data reload: false

query5	5876	612	442	442
query6	367	224	213	213
query7	4614	453	270	270
query8	357	233	234	233
query9	8794	2665	2646	2646
query10	543	370	302	302
query11	15344	15113	14828	14828
query12	188	114	112	112
query13	1238	485	370	370
query14	7787	2977	2680	2680
query14_1	2635	2584	2651	2584
query15	273	192	178	178
query16	992	508	458	458
query17	1253	679	591	591
query18	2709	442	340	340
query19	328	225	213	213
query20	121	119	117	117
query21	215	141	117	117
query22	4173	4231	4107	4107
query23	16241	15838	15305	15305
query23_1	15442	15566	15487	15487
query24	6889	1580	1184	1184
query24_1	1178	1161	1203	1161
query25	572	485	426	426
query26	1222	271	157	157
query27	2676	447	293	293
query28	4575	2173	2158	2158
query29	780	557	464	464
query30	314	235	209	209
query31	783	630	556	556
query32	82	75	65	65
query33	537	335	296	296
query34	868	905	520	520
query35	769	782	753	753
query36	879	877	735	735
query37	122	91	81	81
query38	2703	2697	2613	2613
query39	779	744	729	729
query39_1	718	717	707	707
query40	215	131	122	122
query41	66	62	62	62
query42	104	100	104	100
query43	462	425	445	425
query44	1294	718	724	718
query45	187	185	179	179
query46	842	951	587	587
query47	1357	1440	1438	1438
query48	302	313	255	255
query49	617	423	318	318
query50	622	264	204	204
query51	3825	3848	3715	3715
query52	108	106	93	93
query53	292	315	268	268
query54	278	251	262	251
query55	71	76	67	67
query56	282	283	289	283
query57	1035	980	931	931
query58	268	250	262	250
query59	2024	2120	1872	1872
query60	318	303	291	291
query61	195	162	164	162
query62	381	362	314	314
query63	301	266	272	266
query64	4818	1301	1001	1001
query65	3834	3771	3616	3616
query66	1347	408	302	302
query67	15756	15176	14906	14906
query68	7379	981	703	703
query69	486	355	306	306
query70	1041	954	860	860
query71	381	292	272	272
query72	6025	3400	3469	3400
query73	757	726	307	307
query74	8782	8754	8525	8525
query75	2814	2817	2403	2403
query76	3768	1069	637	637
query77	528	357	276	276
query78	9766	9818	9087	9087
query79	1522	904	592	592
query80	674	567	466	466
query81	520	257	228	228
query82	218	144	114	114
query83	269	265	236	236
query84	262	123	114	114
query85	898	501	474	474
query86	353	322	321	321
query87	2840	2814	2738	2738
query88	3211	2255	2241	2241
query89	393	353	357	353
query90	2303	161	158	158
query91	178	168	144	144
query92	86	63	65	63
query93	1640	910	533	533
query94	583	330	295	295
query95	543	316	298	298
query96	585	492	210	210
query97	2306	2373	2295	2295
query98	220	202	197	197
query99	579	584	503	503
Total cold run time: 258423 ms
Total hot run time: 172259 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3ab7447bcccdda0dacc5a78123aed1bacd7a832f, data reload: false

query1	0.05	0.05	0.04
query2	0.10	0.04	0.05
query3	0.26	0.08	0.08
query4	1.60	0.11	0.11
query5	0.27	0.26	0.25
query6	1.15	0.66	0.65
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.56	0.50	0.48
query10	0.54	0.56	0.54
query11	0.15	0.10	0.09
query12	0.14	0.11	0.11
query13	0.61	0.58	0.58
query14	0.96	0.96	0.95
query15	0.78	0.77	0.79
query16	0.38	0.41	0.40
query17	0.99	1.04	1.02
query18	0.22	0.22	0.22
query19	1.84	1.74	1.84
query20	0.02	0.02	0.01
query21	15.45	0.28	0.14
query22	5.36	0.06	0.05
query23	16.22	0.28	0.10
query24	0.94	0.48	0.50
query25	0.10	0.09	0.06
query26	0.14	0.14	0.13
query27	0.05	0.06	0.06
query28	3.95	1.07	0.88
query29	12.60	3.84	3.16
query30	0.28	0.13	0.12
query31	2.82	0.64	0.39
query32	3.24	0.55	0.45
query33	2.97	3.02	3.08
query34	16.82	5.05	4.41
query35	4.52	4.49	4.44
query36	0.67	0.49	0.49
query37	0.10	0.07	0.06
query38	0.07	0.04	0.03
query39	0.04	0.03	0.03
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 97.37 s
Total hot run time: 26.86 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/17) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 5.88% (1/17) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants