-
Notifications
You must be signed in to change notification settings - Fork 784
Description
My colleage @hongliangl tried to upgrade netlink to a recent commit to pick up a required change. However, our CI became flaky when validating the new netlink version. We identified the flake started from the commit merged via #941.
The above commit changes the socket created by Subscribe
to non-blocking when groups are provided. However, it didn't change how the message was received from the socket, causing the receiver goroutine to run into a busy loop, taking 100% CPU.
Lines 367 to 372 in 856e190
for { | |
msgs, from, err := s.Receive() | |
if err != nil { | |
if err == syscall.EAGAIN { | |
continue | |
} |
The issue can be reproduced by the following code:
// netlink-reproducer.go
package main
import (
"fmt"
"os"
"github.com/vishvananda/netlink"
)
func main() {
ch := make(chan netlink.AddrUpdate, 1000)
stopCh := make(chan struct{})
defer close(stopCh)
if err := netlink.AddrSubscribeWithOptions(ch, stopCh, netlink.AddrSubscribeOptions{
ErrorCallback: func(err error) {
fmt.Fprintf(os.Stderr, "Received err: %v\n", err)
},
}); err != nil {
fmt.Fprintf(os.Stderr, "Failed to subscribe: %v\n", err)
}
for {
select {
case update := <-ch:
fmt.Fprintf(os.Stderr, "Received: %v\n", update)
}
}
}
The process would always take 100%+ CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
412687 root 20 0 1968136 8172 2660 S 103.6 0.0 0:15.53 netlink-reprodu
To fix it, all subscribers' receiver goroutines should use poll or select to wait for events first before receiving messages from the socket. I could take a stab at fixing the implementation, but I wonder if we could revert the commit that introduces the bug first to unblock projects requiring other changes of the library.