-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Increased and erratic memory in the Nginx Pods leading to OOM Kills - appears to be introduced by v3.7.x #6860
Comments
Hi @MarkTopping thanks for reporting! Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this 🙂 Cheers! |
Hi @MarkTopping thanks for opening the issue. 3.7.1 release uses Nginx 1.27.2 https://forum.nginx.org/read.php?27,300237,300237, which now caches SSL certificates, secret keys, and CRLs on start or during reconfiguration.
thanks |
Hi @vepatel Thank you for your response. I'm haven't tested 3.7.0 - and sadly it's not just a matter of having a go for you - todays outage caused quite a bit of disruption so it isn't something I can replicate as and when I see fit. Unless you have particular reason to believe that 3.7.0 would address an issue that was introduced specifically in 3.7.1? Re limits - yes, indeed we are. The graphs kind of hide it, but requests and limits are both set; and for memory they are both equal to one another. We have set that limit though to be 4x higher than what we typically see each Nginx Pod consuming - hence a lot of headroom. A question for you please... thanks for the link... but it doesn't state the implications of the changes. Would I be right in assuming that an increase in memory consumption is expected due to the caching behaviour introduced? Certs aren't exactly big - so I'd assume that would only result in a fairly small memory increase anyway? |
Thank you @MarkTopping for providing details. We are investigating the memory spikes. |
@MarkTopping could you please provide more detailed information about the |
Version
3.7.0
What Kubernetes platforms are you running on?
AKS Azure
Steps to reproduce
I believe that changes in version 3.7.0 or 3.7.1 have introduced a memory consumption issue.
We had to rollback a version bump from v3.6.2 to v3.7.1 today after our Nginx IC Pods all crashed due to OOM Kills. To make matters worse, due to Bug 4604 the Pods then failed to restart (without manual intervention) leading to obvious impact.
Our subsequent investigation after our outage revealed that the memory consumption on the Nginx Pods changed quite dramatically after the release as shown by the following 2 charts.
1st Example
In our least used environment we didn't incur any OOM Kills, but todays investigation revealed how memory usage has both increased, and also become more 'spikey' since we performed the upgrade:
2nd Example
This screenshot shows the IC Pods memory consumption after a release of v3.7.1 into a more busy environment and a subsequent rollback this morning.
What this graph doesn't capture is that the memory went above the 1500MiB line for all Pods in the deployment and thus were OOM Killed. This isn't shown because the metrics are exported every minute and so we just have the last datapoint that happened to be collected before the OOM Kill.
I guess it's worth noting that we also bumped our Helm Chart (not just the image version) with our release. The only notable change with that chart was the explicit creation of the Leader Election resource which I think Nginx used to just create by itself after deployment.
Some environment notes:
The text was updated successfully, but these errors were encountered: