-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traffic disturbance 2 minutes after node restart #11787
Comments
Hm, as far as I know, we fixed something similar in v1.13.0. |
The logs are from a test run with v1.13.0-rc1. Is there a difference between v1.13.0 and v1.13.0-rc1? Just mentioned NSM v1.12.1-rc.1 to clarify that is not a new bug. It is considered as a medium priority issue. |
OK, good that it's not crirical.
Yes, it has a difference. We have fixed a few bugs, like #11372 in |
Ah, I missed the version, sorry: the logs are from a test run with NSM v1.13.0-rc2. Fixing in description. |
We've checked several NSM versions and all of them has the same problem:
|
Current StateWe found several problems that may occur after restarting a node: 1. Ping doesn't work periodically (periods are of the same length)This issue is related to some bugs in 2.
|
Found the solution for the fourth bug. It's in |
NSE Image with fixes: |
@NikitaSkrynnik: We tested the node restart scenario with this image and it was successful each time, so it seems this fix solves the problem. |
@szvincze to check if this issue is resolved in v1.14.0-rc.1 you can pass env variable |
@NikitaSkrynnik: We have verified it in an environment where we evaluate NSM releases and use NSE/NSC from NSM releases. There we had several issues, like traffic disturbance after worker node restart when the pods are back, temporary traffic outage for longer than 30 seconds for one NSE instance and several outages on the other traffic instances. Based on our tests we can say that with the latest release we haven't observe these issues. But the @ljkiraly reported this issue from an environment where we use custom endpoints and clients, where unfortunately we still experience the same behavior. |
Expected Behavior
The node restart should not have impact on traffic between elements running on other nodes.
Current Behavior
Two minutes after a worker restart there was a traffic outage.
Failure Information
Can not reproduce this, but fails often in nightly tests. Logs from a failed test run in
traffic_outage_after_node_reboot_log.tar.gz
The node reboot is at:
[2024-04-05T13:45:03.923Z] robustness-node-restart-test.sh: Rebooting node: worker-pool1-1dn6k2vc-n121-vpod1-pnes8010-ipv4
The traffic has been stopped between: [2024-04-05T13:47:06.910Z] and [2024-04-05T13:48:12.064Z]
Context
The issue can be seen with NSM v1.12.1-rc.1 also.
The text was updated successfully, but these errors were encountered: