-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pause worker during deployment and cache-warmup possible? #66
Comments
Thanks for the thorough write-up. |
@bwaidelich doing it with a "lock-file" is of course just one option - a very rudimentary one of course. Could also be some other kind of mechanism to achieve the same result. (e.g. setting/unsetting a "feature-flag" in the Database or in a key-value store like Redis would be an alternative, but that would tie the package to some external dependencies, that's why I thought of a simple file as a way without dependencies). Of course really stopping the worker would be the cleaner approach. But at least in our setup this would not work as easy:
I see the following two issues that could occur if going this way:
So all in all, I think it's still a good idea to make the application itself (the worker) aware of it's pause and that it should just idle for a moment. |
Thanks for the explanation, makes sense to me (admin noob)! |
I think I'll revoke my idea (or let it open, but don't expect anyone to solve it, unless there's an option I overlook, details below). When testing things with the troublesome installation I went in and flushed the whole Job-Queue at the beginning of the deployment pipeline with the intention to reduce the number of things a worker would try to do. Still got some exceptions during the following deployment (while the application was in "lock" mode while warming up code-caches). This led me to the thought, that this is probably not solvable as proposed: Moving the decision whether to invoke the worker or not to the application sounded great at first. But letting the worker start (e.g. run a Flow CLI Command from supervisord or a cronjob) is itself already triggering an exception while the application is locked and warming up the caches. Unless this can be ignored on some low-level of the application, I don't think anymore that this is the way to to (to let the application know whether the worker shall work or pause). Instead, I'm thinking about a solution that is "working around the cache-warmup-locking". An easy solution could be to write a (b)lock file - and then change the supervisord or cronjob command from
to
This could be beautified of course - but it should do the basic trick: E.g. check if the file exists, if it does, do nothing - if it's not there, run the queue-worker and let it do it's work. So we could throw that (b)lockfile in at the begin of the deployment - then do everything that's needed without the worker actively executing any flow commands - and in the end remove the file to let the worker do it's work again. |
In a larger Neos installation we've seen multiple occasions of the following rough workflow with the result of having a log of Exception-/Stack-Trace files being generated - and detected by our monitoring, causing an alert to check the health of this particular installation due to "too many Exception Files".
flow:cache:warmup
call), by an HTTPS request from the outside, or from the queue-worker (also CLI)So, I think I know how to work around this. And this issue is not about complaining. But while thinking around this issue, I thought it might be cool to have an option to like "pause" the worker.
Something like either a CLI command
queueworker:pause
andqueueworker:unpause
that would e.g. just write/remove a temporary file and while that file is around the worker just takes a break (e.g. does not fetch tasks from the queue to work on).The text was updated successfully, but these errors were encountered: