Replies: 16 comments 2 replies
-
/help wanted |
Beta Was this translation helpful? Give feedback.
-
related to #15996 |
Beta Was this translation helpful? Give feedback.
-
Hi @lance5890 , the instance is reachable? You can try to use ping the other members from endpoint https://10.255.184.7:2379. |
Beta Was this translation helpful? Give feedback.
-
I have also tried ping, its result is as follows:
|
Beta Was this translation helpful? Give feedback.
-
@lance5890 would you mind to share the https://10.255.184.7:2379/ member's log (from start to restart)? thanks |
Beta Was this translation helpful? Give feedback.
-
the latest unnormal logs show as follows(some repeated logs are ignored):
|
Beta Was this translation helpful? Give feedback.
-
@lance5890 Thanks for the information. Checked the log and found that the connection was connected. |
Beta Was this translation helpful? Give feedback.
-
leader logs show as follows:
|
Beta Was this translation helpful? Give feedback.
-
@lance5890 At the 2023-08-18T03:00:28.820Z, the bad member was restarting? If not, maybe you should check firewall or something. It looks like networking issue. EDITED: You can try to find the leader's log at the time when bad member shows error log. The timestamp should be aligned. |
Beta Was this translation helpful? Give feedback.
-
The bad member is restarting by health check, I still suspect the networking issue, but I have already test the networking issue as I described before, do u guys have other testing tool or ways to check the networking issue? |
Beta Was this translation helpful? Give feedback.
-
do we have some common network checking method to verify the etcd cluster, IMO, the etcd cluster's healthy is 99% concerned with the networking issue, "curl" , "nc" , "ping" are the most common networking check method, but after we have done the basic checking, how can can dig more about the problem? |
Beta Was this translation helpful? Give feedback.
-
The newest bad member log show as follows(the log stuck in the last ,without new log for a long time):
and the leader log show as follows:
|
Beta Was this translation helpful? Give feedback.
-
/help wanted etcd used in our case: etcd pod,svc status
etcd clients can't get valid response from the etcd, at the same time etcd pod print following logs
we got two urls from the log
the dns seems work find for both urls, and ping also works (partially)inside other pods, ping the url 1
ping the url 2
telnet url 1
telnet url 2
|
Beta Was this translation helpful? Give feedback.
-
I experienced this today with a single node (i.e. not in a cluster) - it appears there was corruption in the persistent volume. Deleting the pod, persistent volume and persistent volume claim allowed everything to be recreated (I have no issues with losing the data in this instance), which then brought it back on line successfully. |
Beta Was this translation helpful? Give feedback.
-
I am a newbie here: I just restarted rke2 server and since then I have never been able to join cluster though everything works if I don't try to join this node to the cluster. These are some of etc logs: 2024-01-18T21:09:29.186867875Z stderr F {"level":"warn","ts":"2024-01-18T21:09:29.186617Z","caller":"etcdserver/server.go:2085","msg":"failed to publish local member to cluster through raft","local-member-id":"6449e69100ca312c","local-member-attributes":"{Name:ecs3u-da2e2991 ClientURLs:[https://x.x.x.x:2379]}","request-path":"/0/members/6449e69100ca312c/attributes","publish-timeout":"15s","error":"etcdserver: request timed out"}
|
Beta Was this translation helpful? Give feedback.
-
Bug report criteria
What happened?
the etcd cluster endpoint status show as follows :
4.2 ping flood from 10.255.184.5 to 10.255.184.7 has only one packet loss as follows:
What did you expect to happen?
The etcd log stuck in “failed to publish local member to cluster through raft” and no more output log , any other way to test the etcd endpoint connection?
I also suspect the network connection among etcd ednpont, but common network test is ok
How can we reproduce it (as minimally and precisely as possible)?
only happens in one environment
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
Beta Was this translation helpful? Give feedback.
All reactions