Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Make Elasticsearch service restart after config change more graceful #343

Open
frankhetterich opened this issue Aug 27, 2024 · 3 comments · May be fixed by #349
Open

[Feature]: Make Elasticsearch service restart after config change more graceful #343

frankhetterich opened this issue Aug 27, 2024 · 3 comments · May be fixed by #349
Assignees
Labels
bug Something isn't working

Comments

@frankhetterich
Copy link

Describe the feature request

We updated our Elasticsearch instances to Version 8.15.0 using the Rolling update feature of the collection, which works good. The cluster was available all the time.

With the update we changed some parameters in the Elasticsearch config. Now the Collection performs the parameter change as part of the "normal" installation process, after all Nodes where updated. This means the config is changed on all nodes and all Nodes are restarted at once using a handler. This full cluster restart causes that the cluster is unavailable for some time.

For us it makes no sense to perform a rolling update with a lot of tasks to make shure that the Cluster is available all the time and perform afterwards a full cluster restart which leads to the opposite.

Please implement a "graceful" cluster restart (with rolling restarts and cluster health checks) after a config change of Elasticsearch

@frankhetterich frankhetterich added feature New feature or request needs-triage Needs to be triaged labels Aug 27, 2024
@ivareri
Copy link
Contributor

ivareri commented Aug 29, 2024

ouch.

I'd say this is a bug and not a feature. Should be easy enough to create a handler for a rolling cluster restart (all the code is in the repo already), but I'm not sure how to best handle it without duplicating most of the code from elasticsearch-rolling-upgrade.yml in a handler.

Is there a way to inject tasks into a handler? have two tasks files, one with all the tasks to gracefully stop a node, and one with everything to bring it back online and wait for cluster to become green. They could then be included in both a cluster restart handler and the rolling-upgrade file?

So the handler would look something like this:

 - name: Gracefully stop node
      ansible.builtin.include_tasks:
        file: cluster_restart_stop_node.yaml
        
 - name: Start node and wait for green cluster
      ansible.builtin.include_tasks:
        file: cluster_restart_start_node.yaml        

And the Be careful about upgrade when Elasticsearch is running block in elasticsearch-rolling-upgrade.yml would be reduced to something like this

 - name: Gracefully stop node
      ansible.builtin.include_tasks:
        file: cluster_restart_stop_node.yaml

# Tasks to upgrade packages        

 - name: Start node and wait for green cluster
      ansible.builtin.include_tasks:
        file: cluster_restart_start_node.yaml        

@widhalmt
Copy link
Member

Good find and sorry for your bad experience. Yes, definitely a bug. We'll look into it. Thanks also for the suggestions, @ivareri .

@widhalmt widhalmt added bug Something isn't working and removed feature New feature or request needs-triage Needs to be triaged labels Sep 25, 2024
@widhalmt widhalmt self-assigned this Sep 25, 2024
widhalmt added a commit that referenced this issue Oct 24, 2024
@widhalmt widhalmt linked a pull request Oct 24, 2024 that will close this issue
3 tasks
@widhalmt
Copy link
Member

This problem gave me a real headache. I pushed a Draft PR with your idea. That should actually work, but I'm afraid this will need quite extensive testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants