Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stable/redis-ha][BUG] Pre-stop hook unsuccessful #207

Open
StefanKarlsson321 opened this issue Jul 12, 2022 · 5 comments
Open

[stable/redis-ha][BUG] Pre-stop hook unsuccessful #207

StefanKarlsson321 opened this issue Jul 12, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@StefanKarlsson321
Copy link

Describe the bug
Pre-stop hook fails with:

"PreStop hook failed" err="command '/bin/sh /readonly-config/trigger-failover-if-master.sh' exited with 1: Could not connect to Redis at localhost:26379: Connection refused\n"

Sometimes I have seen:
error: command '/bin/sh /readonly-config/trigger-failover-if-master.sh' exited with 1: Warning: AUTH failed

To Reproduce
Steps to reproduce the behavior:

  1. Delete master pod

Expected behavior
Failover triggered on master.

Additional context
Have tried with chart 4.15.0 and 4.17.1 and redis 6.2.5 as well as redis 6.2.6. Seems it is the same behaviour. Also tried with and without exporter. Could it be due to that the sentinel container is terminated first? Auth and sentinel auth is configured.

@StefanKarlsson321 StefanKarlsson321 added the bug Something isn't working label Jul 12, 2022
@DandyDeveloper
Copy link
Owner

@StefanKarlsson321 This is with AUTH enabled presumably?

@StefanKarlsson321
Copy link
Author

Yes with both redis and sentinel AUTH enabled. Hmm, now when I look at it I can see more detail on the AUTH failed message:

message="This node is currently master, we trigger a failover.\n\nAUTH failed: WRONGPASS invalid username-password pair or user is disabled.\n"

@StefanKarlsson321
Copy link
Author

I'll see if I can figure on why I sometimes get WRONGPASS invalid username-password pair or user is disabled. But the cluster is operating normally otherwise, with username and password, so it shouldn't occur from what I understand.

@cm3brian
Copy link

cm3brian commented Nov 27, 2023

I have this same issue and I determined that the problem is that while the redis container is the pod that has the preStop hook in it, this preStop hook references sentinel in its requests (namely that the master needs to be changed), sentinel has no such hook against it and as a result terminates instantly when SIGTERM is sent while redis keeps going trying to exec the hook (and it fails, as can be seen in the FailedPreStopHook event.

What is required is for sentinel to stay up longer than redis does (doesn't have to be by much), to allow it to complete the preStop hook.

I have patched my configs, in a resourceful way (i.e not very pretty, but functional), by adding this preStop hook to the sentinel container:

  preStop:
    exec:
      command:
      - /bin/sh
      - -c
      - until [ $(/health/redis_readiness.sh) != "response=PONG" ]; do sleep
        1; done

This change uses the already existing redis readiness check code to check if redis is still up, and does not close sentinel while this is the case.

This one was a big point of frustration for us as we have some 3rd party platforms with very poor failure handing in redis, so we would get "ghost" outages up to multiple times a week where there were random ~10 seconds outages, due to the sentinel config setting down-after-milliseconds being set to 10000 in the chart's config. Without sentinel the hook fails and master just disappears without explanation making sentinel wait the down-after-milliseconds delay before electing a new master organically.

@tschirmer
Copy link

Ok there are a few reasons for this:
see: #283 (comment)

  1. check your permissions of the readonly-configs mount. Some umask might be applied in that prevent execution of the preStopHook. I needed to alter the helm chart to set the read-only specifically to 755 for it to work correctly.
  2. In the helm chart, the redis pod also doesn't have the SENTENIALAUTH env variable. If you're using sentinel, this will be needed or you'll get Warning: AUTH failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants