Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico potential connection loss to all calico pods when a new calico pod starts #9478

Open
darkgordon opened this issue Nov 15, 2024 · 2 comments

Comments

@darkgordon
Copy link

darkgordon commented Nov 15, 2024

Recently we have spot an issue regarding of the start of a calico node pod.
If a new calico pod starts we have detected that every calico pod on the cluster loss connectivity for few moments. This issue starts to happen arround 2 days, we are using calico v3.27.3 and kubernetes 1.29.9

Expected Behavior

no connection down should occur on other calico pods

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

  1. we delete a random calico node pod

image

  1. if we check the connectivity of another calico pod , we discover that a small spike of connectivity 0%occurs, exactly at the moment the pod we restarted on step 1 starts.
    image

  2. looking at the logs of the new pod i found the following: i would like to know if a lot of resync cound generate an issue of the other calico pods

 felix/dispatcher.go 68: Registering listener for type model.HostConfigKey: (dispatcher.UpdateHandler)(0x1ac5820)
 felix/async_calc_graph.go 256: Starting AsyncCalcGraph
 felix/daemon.go 639: Started the processing graph
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/clusterinformations"
 felix/async_calc_graph.go 137: AsyncCalcGraph running
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/felixconfigurations"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/globalnetworkpolicies"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/globalnetworksets"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/ippools"
 felix/config_params.go 622: Parsing value for LogSeverityScreen: info (from environment variable)
 felix/config_params.go 658: Parsed value for LogSeverityScreen: INFO (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/nodes"
 felix/config_params.go 622: Parsing value for DefaultEndpointToHostAction: ACCEPT (from environment variable)
 felix/config_params.go 658: Parsed value for DefaultEndpointToHostAction: ACCEPT (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/profiles"
 felix/config_params.go 622: Parsing value for HealthEnabled: true (from environment variable)
 felix/config_params.go 658: Parsed value for HealthEnabled: true (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/workloadendpoints"
 felix/config_params.go 622: Parsing value for PrometheusMetricsPort: 9091 (from environment variable)
 felix/config_params.go 658: Parsed value for PrometheusMetricsPort: 9091 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networkpolicies"
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/networksets"
 felix/config_params.go 622: Parsing value for WireguardMTU: 0 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/hostendpoints"
 felix/config_params.go 658: Parsed value for WireguardMTU: 0 (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/bgpconfigurations"
 felix/config_params.go 622: Parsing value for FelixHostname: i-NODENAME (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesnetworkpolicies"
 felix/config_params.go 658: Parsed value for FelixHostname: i-NODENAME (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesendpointslices"
 felix/daemon.go 999: Reading from dataplane driver pipe...
 felix/config_params.go 622: Parsing value for Ipv6Support: false (from environment variable)
 felix/watchercache.go 181: Full resync is required ListRoot="/calico/resources/v3/projectcalico.org/kubernetesservice"

Context

Your Environment

  • Calico version
    v3.27.3
  • Calico dataplane (iptables, windows etc.)
  • Orchestrator version (e.g. kubernetes, mesos, rkt):
    kubernetes 1.29.9
  • Operating System and version:
  • Link to your project (optional):
@lwr20
Copy link
Member

lwr20 commented Nov 15, 2024

If a new calico pod starts we have detected that every calico pod on the cluster loss connectivity for few moments.

Can you explain this further please? connectivity is lost momentarily between what things?
Between calico-node pods? or between workload pods?

@darkgordon
Copy link
Author

Hello
The connection is lost between calico-node pods itself exactly at the moment a new calico node pod is scheduled and running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants