Skip to content

ban-lurker kills child due to Missing errorhandling code in ban_evaluate() #4376

@beltrachi

Description

@beltrachi

Expected Behavior

child process should not die after ban-lurker processing ban operations.

Current Behavior

Child process dies and user facing is like all cache has been flushed.

Bans contained those URL regexps:

  • "^\/(v[^\\/]\/)(books|articles)(\.json)?\?(.+&)(parameter_list%5B%5D=value1)(&.+)*$",
  • "^\/(v[^\\/]\/)books\/2001-a-space-oddissey\/data(\.json)?(\?.)?$",
  • "^\/(v,[^\\/]\/)lists\/(--all|great_of_all_time)\/contents(\.json)?\?(.+&)(element_type=(author|authors))(&.+)*$"

These regexps where the ones that made the child process die. We have reproduced it successfully. Also reproduced in Varnish 7.7 so the issue has not been fixed.

Child (13697) Panic at: Wed, 30 Jul 2025 12:34:28 GMT
Missing errorhandling code in ban_evaluate(), cache/cache_ban.c line 574:
  Condition(rv >= -1) not true.
version = varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1, vrt api = 19.0
ident = Linux,6.1.0-37-cloud-amd64,x86_64,-junix,-smalloc,-sdefault,-hcritbit,epoll
now = 1053305.025056 (mono), 1753878868.729832 (real)
Backtrace:
  0x55c49755bcfe: /usr/sbin/varnishd(+0x5ccfe) [0x55c49755bcfe]
  0x55c4975dc2c5: /usr/sbin/varnishd(VAS_Fail+0x45) [0x55c4975dc2c5]
  0x55c497535867: /usr/sbin/varnishd(+0x36867) [0x55c497535867]
  0x55c497537949: /usr/sbin/varnishd(ban_lurker+0x4b9) [0x55c497537949]
  0x55c497583d01: /usr/sbin/varnishd(+0x84d01) [0x55c497583d01]
  0x7ffb934a81f5: /lib/x86_64-linux-gnu/libc.so.6(+0x891f5) [0x7ffb934a81f5]
  0x7ffb9352889c: /lib/x86_64-linux-gnu/libc.so.6(+0x10989c) [0x7ffb9352889c]
errno = 110 (Connection timed out)
argv = {
  [0] = \"/usr/sbin/varnishd\",
  [1] = \"-F\",
  [2] = \"-a\",
  [3] = \"0.0.0.0:7000\",
  [4] = \"-T\",
  [5] = \"127.0.0.1:7001\",
  [6] = \"-f\",
  [7] = \"/etc/varnish/varnish.vcl\",
  [8] = \"-S\",
  [9] = \"/etc/varnish/secret\",
  [10] = \"-p\",
  [11] = \"vcc_allow_inline_c=on\",
  [12] = \"-p\",
  [13] = \"http_req_hdr_len=20000\",
  [14] = \"-p\",
  [15] = \"http_resp_hdr_len=20000\",
  [16] = \"-p\",
  [17] = \"feature=+esi_disable_xml_check\",
  [18] = \"-p\",
  [19] = \"max_esi_depth=10\",
  [20] = \"-p\",
  [21] = \"feature=+esi_ignore_other_elements\",
  [22] = \"-p\",
  [23] = \"thread_pool_stack=192k\",
  [24] = \"-p\",
  [25] = \"ban_lurker_sleep=0.005\",
  [26] = \"-p\",
  [27] = \"ban_lurker_batch=2000\",
  [28] = \"-s\",
  [29] = \"malloc,24576m\",
}
pthread.self = 0x7ffb84dff6c0
pthread.name = (ban-lurker)
pthread.attr = {
  guard = 4096,
  stack_bottom = 0x7ffb84600000,
  stack_top = 0x7ffb84e00000,
  stack_size = 8388608,
}
thr.req = NULL
thr.busyobj = NULL
thr.worker = NULL
vmods = {
  var = {0x7ffb92ed4150, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},
  querystring = {0x7ffb92ed41c0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},
  std = {0x7ffb92ed4230, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},
},
pools = {
  pool = 0x7ffb8bdfd000 {
    nidle = 91,
    nthr = 100,
    lqueue = 0
  },
  pool = 0x7ffb8bdfd640 {
    nidle = 93,
    nthr = 100,
    lqueue = 0
  },
},

Possible Solution

Understand why VRE_match returns an error, and fix it.

We are using Varnish 6 in production without issues, so we suspect its related to the new PCRE engine.

Steps to Reproduce (for bugs)

  1. Load the cache with 4M objects
  2. Send the mentioned bans

Sorry I can't provide more information for security reasons.

Context

Cache was loaded by 4M objects. Tried to reproduce it with just 10k objects and it did not happen.
We only could reproduce it after loading with 4M objects and then sending the mentioned bans.

Other bans were being sent at all time, just those seem to break VRE_match.

Varnish Cache version

varnish-7.5.0 revision eef2526

Operating system

Debian 12

Source of binary packages used (if any)

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions