Calico potential connection loss to all calico pods when a new calico pod starts #9478

darkgordon · 2024-11-15T00:03:17Z

Recently we have spot an issue regarding of the start of a calico node pod.
If a new calico pod starts we have detected that every calico pod on the cluster loss connectivity for few moments. This issue starts to happen arround 2 days, we are using calico v3.27.3 and kubernetes 1.29.9

Expected Behavior

no connection down should occur on other calico pods

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

we delete a random calico node pod

if we check the connectivity of another calico pod , we discover that a small spike of connectivity 0%occurs, exactly at the moment the pod we restarted on step 1 starts.
looking at the logs of the new pod i found the following: i would like to know if a lot of resync cound generate an issue of the other calico pods

 felix/dispatcher.go 68: Registering listener for type model.HostConfigKey: (dispatcher.UpdateHandler)(0x1ac5820)
 felix/async_calc_graph.go 256: Starting AsyncCalcGraph
 felix/daemon.go 639: Started the processing graph
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
 felix/async_calc_graph.go 137: AsyncCalcGraph running
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/felixconfigurations"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/globalnetworkpolicies"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/globalnetworksets"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
 felix/config_params.go 622: Parsing value for LogSeverityScreen: info (from environment variable)
 felix/config_params.go 658: Parsed value for LogSeverityScreen: INFO (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
 felix/config_params.go 622: Parsing value for DefaultEndpointToHostAction: ACCEPT (from environment variable)
 felix/config_params.go 658: Parsed value for DefaultEndpointToHostAction: ACCEPT (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/profiles"
 felix/config_params.go 622: Parsing value for HealthEnabled: true (from environment variable)
 felix/config_params.go 658: Parsed value for HealthEnabled: true (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints"
 felix/config_params.go 622: Parsing value for PrometheusMetricsPort: 9091 (from environment variable)
 felix/config_params.go 658: Parsed value for PrometheusMetricsPort: 9091 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networkpolicies"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networksets"
 felix/config_params.go 622: Parsing value for WireguardMTU: 0 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
 felix/config_params.go 658: Parsed value for WireguardMTU: 0 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations"
 felix/config_params.go 622: Parsing value for FelixHostname: i-NODENAME (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies"
 felix/config_params.go 658: Parsed value for FelixHostname: i-NODENAME (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"
 felix/daemon.go 999: Reading from dataplane driver pipe...
 felix/config_params.go 622: Parsing value for Ipv6Support: false (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice"

Context

Your Environment

Calico version
v3.27.3
Calico dataplane (iptables, windows etc.)
Orchestrator version (e.g. kubernetes, mesos, rkt):
kubernetes 1.29.9
Operating System and version:
Link to your project (optional):

lwr20 · 2024-11-15T10:13:58Z

If a new calico pod starts we have detected that every calico pod on the cluster loss connectivity for few moments.

Can you explain this further please? connectivity is lost momentarily between what things?
Between calico-node pods? or between workload pods?

darkgordon · 2024-11-15T13:38:27Z

Hello
The connection is lost between calico-node pods itself exactly at the moment a new calico node pod is scheduled and running

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calico potential connection loss to all calico pods when a new calico pod starts #9478

Calico potential connection loss to all calico pods when a new calico pod starts #9478

darkgordon commented Nov 15, 2024 •

edited by lwr20

Loading

lwr20 commented Nov 15, 2024 •

edited

Loading

darkgordon commented Nov 15, 2024

Calico potential connection loss to all calico pods when a new calico pod starts #9478

Calico potential connection loss to all calico pods when a new calico pod starts #9478

Comments

darkgordon commented Nov 15, 2024 • edited by lwr20 Loading

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

lwr20 commented Nov 15, 2024 • edited Loading

darkgordon commented Nov 15, 2024

darkgordon commented Nov 15, 2024 •

edited by lwr20

Loading

lwr20 commented Nov 15, 2024 •

edited

Loading