- 
                Notifications
    You must be signed in to change notification settings 
- Fork 36
          fix(linkerd-cni): prevent parent from outpacing forked monitor_cni_config() process
          #585
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…onfig()` process Signed-off-by: Simon Dickhoven <sdickhoven@everquote.com>
| monitor_state=$( | ||
| (ps --ppid=$monitor_pid -o comm=,state= || true) | | ||
| awk '$1 == "inotifywait" && $2 == "S" {print "ok"}' | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the monitor_state variable will either be an empty string or it will contain (one or more) "ok".
...it should only be one "ok" because there is only one inotifywait process that gets spawned by the monitor_cni_config() function.
however, this logic will continue to function even if multiple inotifywait processes are spawned by monitor_cni_config() because i wanted to make it very difficult for future changes to break this logic.
the subsequent check that is responsible for breaking the infinite loop only cares about whether the monitor_state variable is an empty string or not. so anything besides "ok" (including multiple "ok") will break the loop just fine.
i also experimented with only using ps here:
$ ps --ppid=$monitor_pid -C inotifywait -o state=
however it appears as though ps does not like multiple filters (1: --ppid=$monitor_pid, 2: -C inotifywait) and therefore ignores one of the two.
so i opted to have ps emit all processes that are spawned by monitor_cni_config() (which should be only inotifywait) and then have awk check if one of those processes is inotifywait and whether that process is in the state S (="interruptible sleep") which should indicate that it is not in the process of actively starting up but is instead waiting for filesystem events to come in.
https://www.man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE_CODES
the = after the output column name (comm=,state=) tells ps to omit the header for that column. this is not strictly necessary for the logic to work but we don't need the headers so i added it.
$ ps -o comm,state
COMMAND         S
bash            S
ps              R
vs
$ ps -o comm=,state=
bash            S
ps              R
the || true is there because if ps doesn't find any processes (which is the whole reason for this logic to exist in the first place), it exits with a non-zero exit code (which would then abort the entire script... which we don't want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could get much more elaborate here and use the filesystem for ipc between the main process and the forked monitor_cni_config() process but that would require a lot more logic.
so i opted to keep things relatively straight-forward.
| awk '$1 == "inotifywait" && $2 == "S" {print "ok"}' | ||
| ) | ||
| [ -z "$monitor_state" ] || break | ||
| sleep .1 # 100ms | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that i saw a ~140ms race in production, i figured that waiting in increments of 100ms is reasonable.
| i tested the logic with the following script #!/usr/bin/env bash
set -e
#inotifywait -m /tmp &
myfunc() {
  sleep 2
  inotifywait -m .
}
myfunc &
monitor_pid=$!
while true; do
  monitor_state=$(
    (ps --ppid=$monitor_pid -o comm=,state= || true) |
    awk '$1 == "inotifywait" && $2 == "S" {print "ok"}'
  )
  echo "tick $monitor_state"
  [ -z "$monitor_state" ] || break
  sleep .1 # 100ms
done
echo done
wait -ni also threw in a second  it is also possible to see the new logic in action by inserting a  | 
| Thanks for the very detailed overview. We should be able to take a deeper look at this next week. | 
this pr addresses a race condition in which the parent process outpaces the forked
monitor_cni_config()process.this race condition was observed in a production cluster and can clearly be seen in the log output:
as can be seen above, the
inotifywaitis about 140ms late to the party.the result is that existing cni config files are not patched which then causes the repair controller to incessantly "murder" pods that start up on the doomed worker node.
i should add that we've been running linkerd-cni v1.6.3 (which includes the fix for previously encountered race conditions) since 16 jul 2025 and v1.6.4 since 15 sep 2025 without a problem. yesterday was the first time we've seen this (new) race condition.