[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

Sindica · 2022-01-26T01:14:36Z

What happened:
In Mizar-Arktos integrated local dev environment, kube-dns and coredns pod keeps crashing and get restarted.
In master, using default bridge network solution, kube-dns and coredns could not be started. (sandbox cannot be created. containerd reported "No cni config template is specified, wait for other system components to drop the config.")

From kubelet log, it looks like kubelet is using pod ip managed by Mizar to do health and liveness check for kube-dns and coredns. Those ip addresses are not reachable from kubelet.

I0126 01:01:04.950570   24700 prober.go:121] Readiness probe for "coredns-default-ip-172-30-0-14-78dd67d496-f4mn6_kube-system_system(1f3795f9-84b7-42b0-a54d-8454b9b6c337):coredns" failed (failure): Get http://21.0.21.3:8181/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:01:15.294532   24700 prober.go:121] Liveness probe for "coredns-default-ip-172-30-0-14-78dd67d496-f4mn6_kube-system_system(1f3795f9-84b7-42b0-a54d-8454b9b6c337):coredns" failed (failure): Get http://21.0.21.3:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:02:24.704376   24700 prober.go:121] Readiness probe for "kube-dns-554c5866fc-vtr8t_kube-system_system(ca4d94d2-37a2-4c90-a38f-965668f25f99):kubedns" failed (failure): Get http://21.0.21.0:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:02:09.721243   24700 prober.go:121] Liveness probe for "kube-dns-554c5866fc-vtr8t_kube-system_system(ca4d94d2-37a2-4c90-a38f-965668f25f99):kubedns" failed (failure): Get http://21.0.21.0:10054/healthcheck/kubedns: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh get pods -AT -o wide
TENANT   NAMESPACE     NAME                                              HASHKEY               READY   STATUS    RESTARTS   AGE     IP            NODE             NOMINATED NODE   READINESS GATES
system   default       mizar-daemon-kdftq                                374524931036715172    1/1     Running   0          7m21s   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       mizar-operator-5c97f7478d-nrbl5                   3232150560540599971   1/1     Running   0          7m21s   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       netpod1                                           8432916227983888478   1/1     Running   0          2m16s   21.0.21.19    ip-172-30-0-41   <none>           <none>
system   default       netpod2                                           3668649586141483683   1/1     Running   0          2m16s   21.0.21.22    ip-172-30-0-41   <none>           <none>
system   kube-system   coredns-default-ip-172-30-0-14-78dd67d496-f4mn6   4792669529088497661   0/1     Running   3          7m21s   21.0.21.3     ip-172-30-0-41   <none>           <none>
system   kube-system   kube-dns-554c5866fc-vtr8t                         728123151778937189    2/3     Running   8          7m21s   21.0.21.0     ip-172-30-0-41   <none>           <none>
system   kube-system   virtlet-nhqwd                                     6043574697768950208   3/3     Running   0          7m12s   172.30.0.41   ip-172-30-0-41   <none>           <none>

What you expected to happen:
With Mizar as Arktos network plugin, kube-dns and coredns should be started successfully w/o crashing.

The text was updated successfully, but these errors were encountered:

Sindica · 2022-01-26T03:01:59Z

Starting K8s 1.18 in local cluster, only has kube-dns, no coredns.

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ ./cluster/kubectl.sh get pods -A -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   kube-dns-575cc5f5c-5mcbv   3/3     Running   0          3m32s   172.17.0.2   127.0.0.1   <none>           <none>

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ sudo iptables -t nat -S | grep 172.17.0.2
-A KUBE-SEP-7PPXA5JT5ALVQPIV -s 172.17.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-7PPXA5JT5ALVQPIV -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -s 172.17.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 172.17.0.2:53

Sindica · 2022-01-26T03:16:51Z

Starting preAlkaid in local cluster, only has kube-dns, no coredns.

ubuntu@ip-172-30-0-148:~/go/src/alkaid$ ./cluster/kubectl.sh get pods -A -o wide
NAMESPACE     NAME                        READY   STATUS    RESTARTS   AGE    IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   kube-dns-5b6487d4cd-cdq8r   3/3     Running   0          115s   172.17.0.2   127.0.0.1   <none>           <none>

ubuntu@ip-172-30-0-148:~$ sudo iptables -t nat -S | grep 172.17.0.2
-A KUBE-SEP-7PPXA5JT5ALVQPIV -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-7PPXA5JT5ALVQPIV -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -p udp -m udp -j DNAT --to-destination 172.17.0.2:53

Sindica · 2022-01-26T03:24:09Z

Kube proxy log in arktos:

ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ cat /tmp/kube-proxy.log | grep kube-dns
I0126 03:08:48.019124    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I0126 03:08:48.019143    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
I0126 03:09:01.102105    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:metrics" at 10.0.0.166:9153/TCP
I0126 03:09:01.102122    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:dns" at 10.0.0.166:53/UDP
I0126 03:09:01.102134    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:dns-tcp" at 10.0.0.166:53/TCP
I0126 03:10:20.809149    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns:dns" to []
I0126 03:10:20.809166    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns:dns-tcp" to []
I0126 03:10:21.813249    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:dns" to []
I0126 03:10:21.813271    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:dns-tcp" to []
I0126 03:10:21.813280    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:metrics" to []

in K8s 1.18:

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ cat /tmp/kube-proxy.log | grep kube-dns
I0126 02:57:04.562445    6812 service.go:379] Adding new service port "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I0126 02:57:04.562456    6812 service.go:379] Adding new service port "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
I0126 02:57:27.099960    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to []
I0126 02:57:27.100043    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to []
I0126 02:57:27.744946    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to []
I0126 02:57:27.744987    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to []
I0126 02:57:27.745016    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 02:57:27.745031    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]
I0126 02:57:27.745350    6812 proxier.go:812] Stale udp service kube-system/kube-dns:dns -> 10.0.0.10
I0126 02:57:31.114722    6812 service_health.go:183] Not saving endpoints for unknown healthcheck "kube-system/kube-dns"
I0126 03:12:04.464281    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 03:12:04.464294    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]
I0126 03:12:04.464318    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 03:12:04.464344    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]

Sindica · 2022-01-26T03:29:50Z

Arktos:

ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh get ep -A
NAMESPACE     NAME               ENDPOINTS          AGE   SERVICEGROUPID
default       kubernetes         172.30.0.14:6443   18m   1
kube-system   kube-dns                              18m
kube-system   kube-dns-default                      18m

K8s 1.18:

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ ./cluster/kubectl.sh get ep -A
NAMESPACE     NAME         ENDPOINTS                     AGE
default       kubernetes   172.30.0.156:6443             30m
kube-system   kube-dns     172.17.0.2:53,172.17.0.2:53   29m

Service endpoint was not set correctly in Arktos. Might be a mizar service controller issue.

vinaykul · 2022-02-28T18:40:54Z

still fails randomly. mostly works.

vinaykul · 2022-02-28T18:42:46Z

Carl to verify and close

Sindica self-assigned this Jan 26, 2022

Sindica added this to the 0.10 milestone Feb 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

Sindica commented Jan 26, 2022 •

edited

Loading

Sindica commented Jan 26, 2022 •

edited

Loading

Sindica commented Jan 26, 2022

Sindica commented Jan 26, 2022

Sindica commented Jan 26, 2022

vinaykul commented Feb 28, 2022

vinaykul commented Feb 28, 2022

[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

Comments

Sindica commented Jan 26, 2022 • edited Loading

Sindica commented Jan 26, 2022 • edited Loading

Sindica commented Jan 26, 2022

Sindica commented Jan 26, 2022

Sindica commented Jan 26, 2022

vinaykul commented Feb 28, 2022

vinaykul commented Feb 28, 2022

Sindica commented Jan 26, 2022 •

edited

Loading

Sindica commented Jan 26, 2022 •

edited

Loading