Skip to content

Services Using DSR Occasionally Fail When Routed via IPIP Tunnel #1870

@aauren

Description

@aauren

What happened?

When you have the following conditions:

  • LoadBalancer Service with DSR enabled (kube-router.io/service.dsr: tunnel) and a cluster traffic policy
  • kube-router nodes either in a different subnet or --overlay-type=full configured

Traffic routed to that service via the IPv4 LoadBalancer address, fails to route when coming from a pod within the cluster when the traffic is load-balanced out to a node that is not itself within the cluster (assuming that the endpoint is available on the node). If there is no endpoint on the node, then it will never work.

What did you expect to happen?

Traffic should be able to route regardless of whether it transits an IPIP tunnel or not.

How can we reproduce the behavior you experienced?

Steps to reproduce the behavior:

  1. Deploy kube-router with --overlay-type=full
$ kubectl get nodes -o wide
NAME              STATUS   ROLES           AGE   VERSION    INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
kube-router-vm1   Ready    control-plane   29h   v1.30.14   10.241.0.20   <none>        Ubuntu 24.04.1 LTS   6.8.0-49-generic   containerd://1.7.27
kube-router-vm2   Ready    <none>          29h   v1.30.14   10.241.0.21   <none>        Ubuntu 24.04.1 LTS   6.8.0-62-generic   containerd://1.7.27
  1. Deploy a LoadBalancer service with DSR enabled
apiVersion: v1
kind: Service
metadata:
  annotations:
    kube-router.io/service.dsr: tunnel
    purpose: "Creates a VIP for balancing an application"
  labels:
    name: whoami
  name: whoami
  namespace: default
spec:
  externalIPs:
  - 10.243.0.1
  ports:
  - name: flask
    port: 5000
    protocol: TCP
    targetPort: 5000
  ipFamilyPolicy: PreferDualStack
  selector:
    name: whoami
  type: ClusterIP

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: whoami
  namespace: default
spec:
  selector:
    matchLabels:
      name: whoami
  template:
    metadata:
      labels:
        name: whoami
    spec:
      securityContext:
        runAsUser: 0
        fsGroup: 0
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "node-role.kubernetes.io/control-plane"
        operator: "Exists"
        effect: "NoSchedule"
      containers:
        - name: whoami
          image: "docker.io/aauren/whoami-with-tools"
          imagePullPolicy: Always
          command: ["/whoami"]
          args: ["--port", "5000"]
          securityContext:
            privileged: true
$ kubectl get pods -n default -l name=whoami -o wide
NAME           READY   STATUS    RESTARTS   AGE   IP            NODE              NOMINATED NODE   READINESS GATES
whoami-cqnmn   1/1     Running   0          73m   10.242.1.13   kube-router-vm2   <none>           <none>
whoami-gfzx6   1/1     Running   0          73m   10.242.0.11   kube-router-vm1   <none>           <none>
  1. Deploy a container that allows you to curl the above deployed service
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: debug-toolbox
  namespace: default
spec:
  selector:
    matchLabels:
      name: debug-toolbox
  template:
    metadata:
      labels:
        name: debug-toolbox
    spec:
      tolerations:
        - key: "node-role.kubernetes.io/master"
          operator: "Exists"
          effect: "NoSchedule"
        - key: "node-role.kubernetes.io/control-plane"
          operator: "Exists"
          effect: "NoSchedule"
      containers:
        - name: debug-toolbox
          image: "aauren/debug-toolbox:latest"
          imagePullPolicy: Always
          command: ["/usr/bin/tail"]
          args: ["-f", "/dev/null"]
          securityContext:
            privileged: true
$ kubectl get pods -n default -l name=debug-toolbox -o wide
NAME                  READY   STATUS    RESTARTS   AGE   IP            NODE              NOMINATED NODE   READINESS GATES
debug-toolbox-6d8bf   1/1     Running   0          71m   10.242.0.12   kube-router-vm1   <none>           <none>
debug-toolbox-l6sht   1/1     Running   0          71m   10.242.1.14   kube-router-vm2   <none>           <none>
  1. Get the IPv4 address of the deployed service
$ kubectl get services -n default whoami -o wide
NAME     TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)          AGE   SELECTOR
whoami   LoadBalancer   10.96.236.244   10.243.1.0,2001:db8:42:1200::   5000:32374/TCP   64m   name=whoami
  1. Enter the debug container that you created via exec and curl the IPv4 LB endpoint
% kubectl exec -it -n default debug-toolbox-l6sht -- /bin/bash
root@debug-toolbox-l6sht:/# curl --max-time 5 10.243.1.0:5000
Hostname: whoami-cqnmn
IP: 127.0.0.1
IP: ::1
IP: 10.242.1.13
IP: 2001:db8:42:1001::d
IP: fe80::cc41:6fff:fe94:226c
IP: 10.243.1.0
IP: fe80::5efe:af2:10d
IP: 2001:db8:42:1200::
IP: fe80::486a:fcff:fe46:b21d
RemoteAddr: 10.242.1.14:32844
GET / HTTP/1.1
Host: 10.243.1.0:5000
User-Agent: curl/8.5.0
Accept: */*

root@debug-toolbox-l6sht:/# curl --max-time 5 10.243.1.0:5000
curl: (28) Connection timed out after 5002 milliseconds

If applicable, add those here to help explain your problem.

System Information (please complete the following information)

  • Kube-Router Version (kube-router --version): Current git HEAD
  • Kube-Router Parameters:
--run-router=true --run-firewall=true --run-service-proxy=true --bgp-graceful-restart=true --kubeconfig=/var/lib/kube-router/kubeconfig --peer-router-ips=10.241.0.10 --peer-router-asns=4200000001 --runtime-endpoint=unix:///run/containerd/containerd.sock --cluster-asn=4200000001 --service-cluster-ip-range=10.96.0.0/16 --enable-ipv6=true --service-cluster-ip-range=2001:db8:42:1::/112 --service-external-ip-range=2001:db8:42:1100::/56 --run-loadbalancer=true --advertise-loadbalancer-ip=true --loadbalancer-ip-range=2001:db8:42:1200::/56 --loadbalancer-ip-range=10.243.1.0/24 --service-external-ip-range=10.243.0.0/24 --advertise-external-ip=true -v=1 --overlay-type=full --overlay-encap=fou
  • Kubernetes Version (kubectl version) : v1.30.14
  • Cloud Type: AWS / On-Premise
  • Kubernetes Deployment Type: kubeadm
  • Kube-Router Deployment Type: daemonset
  • Cluster Size: 3 - 10 nodes

Logs, other output, metrics

tcpdump from originating host:

21:32:42.771939 veth4b0f843c P   IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125557771 ecr 0,nop,wscale 7], length 0
21:32:42.771949 kube-bridge In  IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125557771 ecr 0,nop,wscale 7], length 0
21:32:42.771964 tun-84738b0d7da Out IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125557771 ecr 0,nop,wscale 7], length 0
21:32:42.772103 tun-84738b0d7da In  IP 10.243.1.0.5000 > 10.242.1.14.39320: Flags [S.], seq 2759326296, ack 1380667210, win 65160, options [mss 1440,sackOK,TS val 3562181057 ecr 1125557771,nop,wscale 7], length 0
21:32:43.776404 veth4b0f843c P   IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125558776 ecr 0,nop,wscale 7], length 0
21:32:43.776411 kube-bridge In  IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125558776 ecr 0,nop,wscale 7], length 0
21:32:43.776425 tun-84738b0d7da Out IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125558776 ecr 0,nop,wscale 7], length 0
21:32:43.776572 tun-84738b0d7da In  IP 10.243.1.0.5000 > 10.242.1.14.39320: Flags [S.], seq 2759326296, ack 1380667210, win 65160, options [mss 1440,sackOK,TS val 3562182061 ecr 1125557771,nop,wscale 7], length 0
21:32:44.791816 tun-84738b0d7da In  IP 10.243.1.0.5000 > 10.242.1.14.39320: Flags [S.], seq 2759326296, ack 1380667210, win 65160, options [mss 1440,sackOK,TS val 3562183077 ecr 1125557771,nop,wscale 7], length 0
21:32:44.800399 veth4b0f843c P   IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125559800 ecr 0,nop,wscale 7], length 0
21:32:44.800406 kube-bridge In  IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125559800 ecr 0,nop,wscale 7], length 0
21:32:44.800418 tun-84738b0d7da Out IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.39320 > 10.243.1.0.5000: Flags [S], seq 1380667209, win 64240, options [mss 1460,sackOK,TS val 1125559800 ecr 0,nop,wscale 7], length 0
21:32:44.800645 tun-84738b0d7da In  IP 10.243.1.0.5000 > 10.242.1.14.39320: Flags [S.], seq 2759326296, ack 1380667210, win 65160, options [mss 1440,sackOK,TS val 3562183085 ecr 1125557771,nop,wscale 7], length 0
21:31:53.241171 tun-643ed15b0b4 In  IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.59250 > 10.243.1.0.5000: Flags [S], seq 539983253, win 64240, options [mss 1460,sackOK,TS val 1125508249 ecr 0,nop,wscale 7], length 0
21:31:53.241229 vethfeecec00 P   IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1460,sackOK,TS val 3562131535 ecr 1125508249,nop,wscale 7], length 0
21:31:53.241236 kube-bridge In  IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562131535 ecr 1125508249,nop,wscale 7], length 0
21:31:53.241240 tun-643ed15b0b4 Out IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562131535 ecr 1125508249,nop,wscale 7], length 0
21:31:54.286374 vethfeecec00 P   IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1460,sackOK,TS val 3562132581 ecr 1125508249,nop,wscale 7], length 0
21:31:54.286385 kube-bridge In  IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562132581 ecr 1125508249,nop,wscale 7], length 0
21:31:54.286397 tun-643ed15b0b4 Out IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562132581 ecr 1125508249,nop,wscale 7], length 0
21:31:54.295342 tun-643ed15b0b4 In  IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.59250 > 10.243.1.0.5000: Flags [S], seq 539983253, win 64240, options [mss 1460,sackOK,TS val 1125509304 ecr 0,nop,wscale 7], length 0
21:31:54.295379 vethfeecec00 P   IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1460,sackOK,TS val 3562132590 ecr 1125508249,nop,wscale 7], length 0
21:31:54.295381 kube-bridge In  IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562132590 ecr 1125508249,nop,wscale 7], length 0
21:31:54.295384 tun-643ed15b0b4 Out IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562132590 ecr 1125508249,nop,wscale 7], length 0
21:31:55.319375 tun-643ed15b0b4 In  IP 10.241.0.21 > 10.242.0.11: IP 10.242.1.14.59250 > 10.243.1.0.5000: Flags [S], seq 539983253, win 64240, options [mss 1460,sackOK,TS val 1125510328 ecr 0,nop,wscale 7], length 0
21:31:55.319429 vethfeecec00 P   IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1460,sackOK,TS val 3562133614 ecr 1125508249,nop,wscale 7], length 0
21:31:55.319435 kube-bridge In  IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562133614 ecr 1125508249,nop,wscale 7], length 0
21:31:55.319439 tun-643ed15b0b4 Out IP 10.243.1.0.5000 > 10.242.1.14.59250: Flags [S.], seq 4010232618, ack 539983254, win 65160, options [mss 1440,sackOK,TS val 3562133614 ecr 1125508249,nop,wscale 7], length 0

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugoverride-staleDon't allow automatic management of stale issues / PRs

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions