Skip to content

[Bug/Feature] EFA-enabled nodegroups create conflicting security group tags causing AWS Load Balancer Controller failures #8475

@sapphirew

Description

@sapphirew

Problem Description

When efaEnabled: true is set on a nodegroup, eksctl creates an additional security group for EFA communication that gets tagged with kubernetes.io/cluster/<cluster-name>: owned. This creates a conflict with the AWS Load Balancer Controller's target group binding logic, which expects exactly one security group with this tag per ENI.

Error Message

Warning  FailedNetworkReconcile  33s (xxxxx over 2d1h)  targetGroupBinding  
expected exactly one securityGroup tagged with kubernetes.io/cluster/kreks for eni eni-xxxxxxx, 
got: [sg-xxxxxxxxx sg-xxxxxxxxx] (clusterName: xxxxx)

What were you trying to accomplish?

Create an EFA-enabled EKS nodegroup that can also use Network Load Balancer (NLB) services without conflicts.

What happened?

  1. Created a nodegroup with efaEnabled: true
  2. eksctl created two security groups both tagged with kubernetes.io/cluster/<cluster-name>: owned:
    • The original nodegroup security group
    • An additional EFA-specific security group
  3. When creating NLB services, the AWS Load Balancer Controller fails because it finds multiple security groups with the cluster ownership tag https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/pkg/networking/networking_manager.go#L571

How to reproduce it?

Config File

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: test-cluster
  region: us-west-2

nodeGroups:
  - name: efa-workers
    instanceType: c5n.18xlarge
    minSize: 1
    maxSize: 3
    availabilityZones: ["us-west-2a"]
    efaEnabled: true

Steps

  1. Create cluster with EFA-enabled nodegroup: eksctl create cluster -f config.yaml
  2. Deploy a service with NLB:
    apiVersion: v1
    kind: Service
    metadata:
      name: test-nlb
      annotations:
        service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    spec:
      type: LoadBalancer
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: test-app
  3. Observe the FailedNetworkReconcile error in AWS Load Balancer Controller logs

Logs

Anything else we need to know?

A possible solution is to create a new configuration option to control EFA security group tagging:

nodeGroups:
  - name: efa-workers
    efaEnabled: true
    efaSecurityGroupTagging:
      clusterOwnership: "shared"  # or "owned", "none"

Versions

$ eksctl info
eksctl version: 0.212.0
kubectl version: v1.33.3
OS: darwin

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions