-
Notifications
You must be signed in to change notification settings - Fork 399
Description
Expected Behavior
child process should not die after ban-lurker processing ban operations.
Current Behavior
Child process dies and user facing is like all cache has been flushed.
Bans contained those URL regexps:
- "^\/(v[^\\/]\/)(books|articles)(\.json)?\?(.+&)(parameter_list%5B%5D=value1)(&.+)*$",
- "^\/(v[^\\/]\/)books\/2001-a-space-oddissey\/data(\.json)?(\?.)?$",
- "^\/(v,[^\\/]\/)lists\/(--all|great_of_all_time)\/contents(\.json)?\?(.+&)(element_type=(author|authors))(&.+)*$"
These regexps where the ones that made the child process die. We have reproduced it successfully. Also reproduced in Varnish 7.7 so the issue has not been fixed.
Child (13697) Panic at: Wed, 30 Jul 2025 12:34:28 GMT
Missing errorhandling code in ban_evaluate(), cache/cache_ban.c line 574:
Condition(rv >= -1) not true.
version = varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1, vrt api = 19.0
ident = Linux,6.1.0-37-cloud-amd64,x86_64,-junix,-smalloc,-sdefault,-hcritbit,epoll
now = 1053305.025056 (mono), 1753878868.729832 (real)
Backtrace:
0x55c49755bcfe: /usr/sbin/varnishd(+0x5ccfe) [0x55c49755bcfe]
0x55c4975dc2c5: /usr/sbin/varnishd(VAS_Fail+0x45) [0x55c4975dc2c5]
0x55c497535867: /usr/sbin/varnishd(+0x36867) [0x55c497535867]
0x55c497537949: /usr/sbin/varnishd(ban_lurker+0x4b9) [0x55c497537949]
0x55c497583d01: /usr/sbin/varnishd(+0x84d01) [0x55c497583d01]
0x7ffb934a81f5: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x7ffb934a81f5]
0x7ffb9352889c: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x7ffb9352889c]
errno = 110 (Connection timed out)
argv = {
[0] = \"/usr/sbin/varnishd\",
[1] = \"-F\",
[2] = \"-a\",
[3] = \"0.0.0.0:7000\",
[4] = \"-T\",
[5] = \"127.0.0.1:7001\",
[6] = \"-f\",
[7] = \"/etc/varnish/varnish.vcl\",
[8] = \"-S\",
[9] = \"/etc/varnish/secret\",
[10] = \"-p\",
[11] = \"vcc_allow_inline_c=on\",
[12] = \"-p\",
[13] = \"http_req_hdr_len=20000\",
[14] = \"-p\",
[15] = \"http_resp_hdr_len=20000\",
[16] = \"-p\",
[17] = \"feature=+esi_disable_xml_check\",
[18] = \"-p\",
[19] = \"max_esi_depth=10\",
[20] = \"-p\",
[21] = \"feature=+esi_ignore_other_elements\",
[22] = \"-p\",
[23] = \"thread_pool_stack=192k\",
[24] = \"-p\",
[25] = \"ban_lurker_sleep=0.005\",
[26] = \"-p\",
[27] = \"ban_lurker_batch=2000\",
[28] = \"-s\",
[29] = \"malloc,24576m\",
}
pthread.self = 0x7ffb84dff6c0
pthread.name = (ban-lurker)
pthread.attr = {
guard = 4096,
stack_bottom = 0x7ffb84600000,
stack_top = 0x7ffb84e00000,
stack_size = 8388608,
}
thr.req = NULL
thr.busyobj = NULL
thr.worker = NULL
vmods = {
var = {0x7ffb92ed4150, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},
querystring = {0x7ffb92ed41c0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},
std = {0x7ffb92ed4230, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},
},
pools = {
pool = 0x7ffb8bdfd000 {
nidle = 91,
nthr = 100,
lqueue = 0
},
pool = 0x7ffb8bdfd640 {
nidle = 93,
nthr = 100,
lqueue = 0
},
},
Possible Solution
Understand why VRE_match returns an error, and fix it.
We are using Varnish 6 in production without issues, so we suspect its related to the new PCRE engine.
Steps to Reproduce (for bugs)
- Load the cache with 4M objects
- Send the mentioned bans
Sorry I can't provide more information for security reasons.
Context
Cache was loaded by 4M objects. Tried to reproduce it with just 10k objects and it did not happen.
We only could reproduce it after loading with 4M objects and then sending the mentioned bans.
Other bans were being sent at all time, just those seem to break VRE_match.
Varnish Cache version
varnish-7.5.0 revision eef2526
Operating system
Debian 12
Source of binary packages used (if any)
No response