Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana alert manager memory usage constantly increase #3428

Open
doanbutar opened this issue Nov 15, 2024 · 0 comments
Open

Grafana alert manager memory usage constantly increase #3428

doanbutar opened this issue Nov 15, 2024 · 0 comments

Comments

@doanbutar
Copy link

What happened?
We are using the internal grafana alert manager and are seeing the memory usage constantly increase (container_memory_usage_bytes). I suspect the alert states are being persisted forever in internal memory.

We have roughly 30 alerts running on 1 instance.

What did you expect to happen?
We would expect the memory usage to go down at some point.

Did this work before?
Without having any alerts we see that it is stable and does not increase continuously.

Environment (with versions):
Grafana helm chart: 6.52.1 (9.4.3)
We upgraded to grafana version 11.1.1 but are still seeing the memory increase continuously

The values we are using are as follows:

grafana:
  replicas: 1
  env:
    GF_SECURITY_DISABLE_INITIAL_ADMIN_CREATION: true

  rbac:
    create: false

  image:
    repository:<>
    pullPolicy: IfNotPresent
    tag: 2368ac295

  securityContext:
    runAsUser: 99
    runAsGroup: 99
    fsGroup: 99

  persistence:
    type: sts
    enabled: true
    storageClassName: ebs

  initChownData:
    enabled: false

  ingress:
    enabled: false

  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 500m
      memory: 1Gi
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 500m
      memory: 1Gi

And then we override some grafana.ini values as follows:

[analytics]
check_for_plugin_updates = false
check_for_updates = false
reporting_enabled = false
[auth]
login_cookie_name = grafana_sess
[auth.basic]
enabled = false
[auth.ldap]
enabled = true
[auth.proxy]
enable_login_token = true
enabled = true
header_name = X-REMOTE-USER
[feature_toggles]
dashboardScene = true
[grafana_net]
url = https://grafana.net
[paths]
data = /var/lib/grafana/
logs = /var/log/grafana
plugins = /usr/share/grafana/plugins/
provisioning = /etc/grafana/provisioning
[plugins]
allow_loading_unsigned_plugins = <grafana-datasource>
plugin_admin_enabled = true
public_key_retrieval_disabled = false
[security]
allow_embedding = true
[server]
domain = ''
root_url = <url>
[unified_alerting]
admin_config_poll_interval = 10m
alertmanager_config_poll_interval = 10m
evaluation_timeout = 1m
resolved_alert_retention = 30m

Attached 2 pprof files
Archive.zip

Some thoughts I had:
I suspect that garbage collection is not occuring, and we are getting this constantly increasing memory usage as a result. I am trying to set GOMEMLIMIT to see if that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant