Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC Mizar-Arktos Integration] Coredns, kube-dns are crashing and restarting in scale up and scale up env #1309

Open
Sindica opened this issue Jan 26, 2022 · 6 comments
Assignees
Milestone

Comments

@Sindica
Copy link
Collaborator

Sindica commented Jan 26, 2022

What happened:
In Mizar-Arktos integrated local dev environment, kube-dns and coredns pod keeps crashing and get restarted.
In master, using default bridge network solution, kube-dns and coredns could not be started. (sandbox cannot be created. containerd reported "No cni config template is specified, wait for other system components to drop the config.")

From kubelet log, it looks like kubelet is using pod ip managed by Mizar to do health and liveness check for kube-dns and coredns. Those ip addresses are not reachable from kubelet.

I0126 01:01:04.950570   24700 prober.go:121] Readiness probe for "coredns-default-ip-172-30-0-14-78dd67d496-f4mn6_kube-system_system(1f3795f9-84b7-42b0-a54d-8454b9b6c337):coredns" failed (failure): Get http://21.0.21.3:8181/ready: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:01:15.294532   24700 prober.go:121] Liveness probe for "coredns-default-ip-172-30-0-14-78dd67d496-f4mn6_kube-system_system(1f3795f9-84b7-42b0-a54d-8454b9b6c337):coredns" failed (failure): Get http://21.0.21.3:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:02:24.704376   24700 prober.go:121] Readiness probe for "kube-dns-554c5866fc-vtr8t_kube-system_system(ca4d94d2-37a2-4c90-a38f-965668f25f99):kubedns" failed (failure): Get http://21.0.21.0:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I0126 01:02:09.721243   24700 prober.go:121] Liveness probe for "kube-dns-554c5866fc-vtr8t_kube-system_system(ca4d94d2-37a2-4c90-a38f-965668f25f99):kubedns" failed (failure): Get http://21.0.21.0:10054/healthcheck/kubedns: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh get pods -AT -o wide
TENANT   NAMESPACE     NAME                                              HASHKEY               READY   STATUS    RESTARTS   AGE     IP            NODE             NOMINATED NODE   READINESS GATES
system   default       mizar-daemon-kdftq                                374524931036715172    1/1     Running   0          7m21s   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       mizar-operator-5c97f7478d-nrbl5                   3232150560540599971   1/1     Running   0          7m21s   172.30.0.41   ip-172-30-0-41   <none>           <none>
system   default       netpod1                                           8432916227983888478   1/1     Running   0          2m16s   21.0.21.19    ip-172-30-0-41   <none>           <none>
system   default       netpod2                                           3668649586141483683   1/1     Running   0          2m16s   21.0.21.22    ip-172-30-0-41   <none>           <none>
system   kube-system   coredns-default-ip-172-30-0-14-78dd67d496-f4mn6   4792669529088497661   0/1     Running   3          7m21s   21.0.21.3     ip-172-30-0-41   <none>           <none>
system   kube-system   kube-dns-554c5866fc-vtr8t                         728123151778937189    2/3     Running   8          7m21s   21.0.21.0     ip-172-30-0-41   <none>           <none>
system   kube-system   virtlet-nhqwd                                     6043574697768950208   3/3     Running   0          7m12s   172.30.0.41   ip-172-30-0-41   <none>           <none>

What you expected to happen:
With Mizar as Arktos network plugin, kube-dns and coredns should be started successfully w/o crashing.

@Sindica Sindica self-assigned this Jan 26, 2022
@Sindica
Copy link
Collaborator Author

Sindica commented Jan 26, 2022

Starting K8s 1.18 in local cluster, only has kube-dns, no coredns.

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ ./cluster/kubectl.sh get pods -A -o wide
NAMESPACE     NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   kube-dns-575cc5f5c-5mcbv   3/3     Running   0          3m32s   172.17.0.2   127.0.0.1   <none>           <none>
ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ sudo iptables -t nat -S | grep 172.17.0.2
-A KUBE-SEP-7PPXA5JT5ALVQPIV -s 172.17.0.2/32 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-MARK-MASQ
-A KUBE-SEP-7PPXA5JT5ALVQPIV -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -s 172.17.0.2/32 -m comment --comment "kube-system/kube-dns:dns" -j KUBE-MARK-MASQ
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 172.17.0.2:53

@Sindica
Copy link
Collaborator Author

Sindica commented Jan 26, 2022

Starting preAlkaid in local cluster, only has kube-dns, no coredns.

ubuntu@ip-172-30-0-148:~/go/src/alkaid$ ./cluster/kubectl.sh get pods -A -o wide
NAMESPACE     NAME                        READY   STATUS    RESTARTS   AGE    IP           NODE        NOMINATED NODE   READINESS GATES
kube-system   kube-dns-5b6487d4cd-cdq8r   3/3     Running   0          115s   172.17.0.2   127.0.0.1   <none>           <none>

ubuntu@ip-172-30-0-148:~$ sudo iptables -t nat -S | grep 172.17.0.2
-A KUBE-SEP-7PPXA5JT5ALVQPIV -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-7PPXA5JT5ALVQPIV -p tcp -m tcp -j DNAT --to-destination 172.17.0.2:53
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -s 172.17.0.2/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SNPTLXDNVSPZ5ND2 -p udp -m udp -j DNAT --to-destination 172.17.0.2:53

@Sindica
Copy link
Collaborator Author

Sindica commented Jan 26, 2022

Kube proxy log in arktos:

ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ cat /tmp/kube-proxy.log | grep kube-dns
I0126 03:08:48.019124    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I0126 03:08:48.019143    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
I0126 03:09:01.102105    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:metrics" at 10.0.0.166:9153/TCP
I0126 03:09:01.102122    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:dns" at 10.0.0.166:53/UDP
I0126 03:09:01.102134    5779 service.go:360] Adding new service port "system_0/kube-system/kube-dns-default:dns-tcp" at 10.0.0.166:53/TCP
I0126 03:10:20.809149    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns:dns" to []
I0126 03:10:20.809166    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns:dns-tcp" to []
I0126 03:10:21.813249    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:dns" to []
I0126 03:10:21.813271    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:dns-tcp" to []
I0126 03:10:21.813280    5779 endpoints.go:281] Setting endpoints for "system_0/kube-system/kube-dns-default:metrics" to []

in K8s 1.18:

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ cat /tmp/kube-proxy.log | grep kube-dns
I0126 02:57:04.562445    6812 service.go:379] Adding new service port "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I0126 02:57:04.562456    6812 service.go:379] Adding new service port "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
I0126 02:57:27.099960    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to []
I0126 02:57:27.100043    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to []
I0126 02:57:27.744946    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to []
I0126 02:57:27.744987    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to []
I0126 02:57:27.745016    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 02:57:27.745031    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]
I0126 02:57:27.745350    6812 proxier.go:812] Stale udp service kube-system/kube-dns:dns -> 10.0.0.10
I0126 02:57:31.114722    6812 service_health.go:183] Not saving endpoints for unknown healthcheck "kube-system/kube-dns"
I0126 03:12:04.464281    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 03:12:04.464294    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]
I0126 03:12:04.464318    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [172.17.0.2:53]
I0126 03:12:04.464344    6812 endpoints.go:376] Setting endpoints for "kube-system/kube-dns:dns" to [172.17.0.2:53]

@Sindica
Copy link
Collaborator Author

Sindica commented Jan 26, 2022

Arktos:

ubuntu@ip-172-30-0-14:~/go/src/k8s.io/arktos$ ./cluster/kubectl.sh get ep -A
NAMESPACE     NAME               ENDPOINTS          AGE   SERVICEGROUPID
default       kubernetes         172.30.0.14:6443   18m   1
kube-system   kube-dns                              18m
kube-system   kube-dns-default                      18m

K8s 1.18:

ubuntu@ip-172-30-0-156:~/go/src/kubernetes$ ./cluster/kubectl.sh get ep -A
NAMESPACE     NAME         ENDPOINTS                     AGE
default       kubernetes   172.30.0.156:6443             30m
kube-system   kube-dns     172.17.0.2:53,172.17.0.2:53   29m

Service endpoint was not set correctly in Arktos. Might be a mizar service controller issue.

@Sindica Sindica added this to the 0.10 milestone Feb 17, 2022
@vinaykul
Copy link
Member

still fails randomly. mostly works.

@vinaykul
Copy link
Member

Carl to verify and close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants