Skip to content

Subscribe and its variants run into a busy loop taking 100% CPU when groups is not empty #975

@tnqn

Description

@tnqn

My colleage @hongliangl tried to upgrade netlink to a recent commit to pick up a required change. However, our CI became flaky when validating the new netlink version. We identified the flake started from the commit merged via #941.

The above commit changes the socket created by Subscribe to non-blocking when groups are provided. However, it didn't change how the message was received from the socket, causing the receiver goroutine to run into a busy loop, taking 100% CPU.

netlink/addr_linux.go

Lines 367 to 372 in 856e190

for {
msgs, from, err := s.Receive()
if err != nil {
if err == syscall.EAGAIN {
continue
}

The issue can be reproduced by the following code:

// netlink-reproducer.go
package main

import (
	"fmt"
	"os"

	"github.com/vishvananda/netlink"
)

func main() {
	ch := make(chan netlink.AddrUpdate, 1000)
	stopCh := make(chan struct{})
	defer close(stopCh)
	if err := netlink.AddrSubscribeWithOptions(ch, stopCh, netlink.AddrSubscribeOptions{
		ErrorCallback: func(err error) {
			fmt.Fprintf(os.Stderr, "Received err: %v\n", err)
		},
	}); err != nil {
		fmt.Fprintf(os.Stderr, "Failed to subscribe: %v\n", err)
	}
	for {
		select {
		case update := <-ch:
			fmt.Fprintf(os.Stderr, "Received: %v\n", update)
		}
	}
}

The process would always take 100%+ CPU:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 412687 root      20   0 1968136   8172   2660 S 103.6   0.0   0:15.53 netlink-reprodu

To fix it, all subscribers' receiver goroutines should use poll or select to wait for events first before receiving messages from the socket. I could take a stab at fixing the implementation, but I wonder if we could revert the commit that introduces the bug first to unblock projects requiring other changes of the library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions