Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions cni-plugin/deployment/scripts/install-cni.sh
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,21 @@ install_cni_bin
# CNI config files.
CNI_CONF_SHA='{}'
monitor_cni_config &
monitor_pid=$!

# The following logic waits (indefinitely if need be) for `inotifywait` in the
# forked `monitor_cni_config()` function to enter "interruptible sleep" state.
# This state indicates that `inotifywait` is fully up and ready to respond to
# filesystem events.
log "Wait for CNI config monitor to become ready"
while true; do
monitor_state=$(
(ps --ppid=$monitor_pid -o comm=,state= || true) |
awk '$1 == "inotifywait" && $2 == "S" {print "ok"}'
)
Comment on lines +355 to +358
Copy link
Contributor Author

@sdickhoven sdickhoven Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the monitor_state variable will either be an empty string or it will contain (one or more) "ok".

...it should only be one "ok" because there is only one inotifywait process that gets spawned by the monitor_cni_config() function.

however, this logic will continue to function even if multiple inotifywait processes are spawned by monitor_cni_config() because i wanted to make it very difficult for future changes to break this logic.

the subsequent check that is responsible for breaking the infinite loop only cares about whether the monitor_state variable is an empty string or not. so anything besides "ok" (including multiple "ok") will break the loop just fine.

i also experimented with only using ps here:

$ ps --ppid=$monitor_pid -C inotifywait -o state=

however it appears as though ps does not like multiple filters (1: --ppid=$monitor_pid, 2: -C inotifywait) and therefore ignores one of the two.

so i opted to have ps emit all processes that are spawned by monitor_cni_config() (which should be only inotifywait) and then have awk check if one of those processes is inotifywait and whether that process is in the state S (="interruptible sleep") which should indicate that it is not in the process of actively starting up but is instead waiting for filesystem events to come in.

https://www.man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE_CODES

the = after the output column name (comm=,state=) tells ps to omit the header for that column. this is not strictly necessary for the logic to work but we don't need the headers so i added it.

$ ps -o comm,state
COMMAND         S
bash            S
ps              R

vs

$ ps -o comm=,state=
bash            S
ps              R

the || true is there because if ps doesn't find any processes (which is the whole reason for this logic to exist in the first place), it exits with a non-zero exit code (which would then abort the entire script... which we don't want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could get much more elaborate here and use the filesystem for ipc between the main process and the forked monitor_cni_config() process but that would require a lot more logic.

so i opted to keep things relatively straight-forward.

[ -z "$monitor_state" ] || break
sleep .1 # 100ms
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that i saw a ~140ms race in production, i figured that waiting in increments of 100ms is reasonable.

done

# Append our config to any existing config file (*.conflist or *.conf)
config_files=$(find "${HOST_CNI_NET}" -maxdepth 1 -type f ! -name '*linkerd*' \( -iname '*conflist' -o -iname '*conf' \))
Expand Down
Loading