[Suggestion] Move jobs between queues, change the job body, custom metadata for a better failure handling #174

Revisor · 2016-02-28T16:53:24Z

Hi,
this suggestion is connected to #170 in that both concern the handling of failed jobs.

I would like to handle failed jobs as follows:

NACK the failed job with an ever growing delay ([Request] NACK with a delay #170)
If the number of retries is higher than X, move the job to a failure queue (dead letter queue) with a new TTL, so that it can be inspected manually and acted upon

Neither of these actions are possible in Disque right now and if using a workaround - adding a new, copied job - we lose both the job ID as well as the NACK and add. delivery counters.

That's why I would like to propose four enhancements (proposals 3. and 4. are different solutions of the same problem):

Allow to NACK a job with a delay ([Request] NACK with a delay #170)
Allow to move a job to a different queue with a new TTL
Allow callers to change the job body
OR even better, if feasible: Implement custom job metadata support, like NACKs and additional-deliveries but user-defined and mutable

Ad 3. We use the job body to store job metadata. We use metadata to work around missing features 1. and 2. - we store the original job ID as well as the total number of retries there. It could also be helpful to eg. save the exact time and reason the job has failed. This requires changing the existing job body.
Supporting custom, mutable job metadata as a first class citizen in Disque would be even better.

The point of all these suggestions is to keep the ID of a job intact throughout its lifetime while allowing for a more complex handling (delayed NACKing, moving between queues, storing extra details).

What do you think? Are the suggestions too complex? Are they useful?

mathieulongtin · 2016-02-28T17:18:02Z

I kind of like the BURY and KICK command of Beanstalkd for that. When a job
is problematic, you bury it, it stays in the queue but is never
distributed. If you fix the problem, you can kick it and it will be
distributed again.

https://github.com/kr/beanstalkd/blob/v1.3/doc/protocol.txt

Another option for Disque would be to stay pretty bare-boned but allow Lua
functions to be loaded for customized behaviour like you're describing. For
example, some queue might have a Lua callback on nack that set the retry
time, or if too many retries have been done, push the job elsewhere.

On Sun, Feb 28, 2016 at 11:53 AM Revisor [email protected] wrote:

Hi,
this suggestion is connected to #170
#170 in that both concern the
handling of failed jobs.

I would like to handle failed jobs as follows:

NACK the failed job with an ever growing delay ([Request] NACK with a delay #170
[Request] NACK with a delay #170)

If the number of retries is higher than X, move the job to a failure
queue (dead letter queue) with a new TTL, so that it can be inspected
manually and acted upon

Neither of these actions are possible in Disque right now and if using a
workaround - adding a new, copied job - we lose both the job ID as well as
the NACK and add. delivery counters.

That's why I would like to propose four enhancements (proposals 3. and 4.
are different solutions of the same problem):

Allow to NACK a job with a delay ([Request] NACK with a delay #170
[Request] NACK with a delay #170)

Allow to move a job to a different queue with a new TTL

Allow callers to change the job body

OR even better, if feasible: Implement custom job metadata support,
like NACKs and additional-deliveries but user-defined and mutable

Ad 3. We use the job body to store job metadata. We use metadata to work
around missing features 1. and 2. - we store the original job ID as well as
the total number of retries there. It could also be helpful to eg. save the
exact time and reason the job has failed. This requires changing the
existing job body.
Supporting custom, mutable job metadata as a first class citizen in Disque
would be even better.

The point of all these suggestions is to keep the ID of a job intact
throughout its lifetime while allowing for a more complex handling (delayed
NACKing, moving between queues, storing extra details).

What do you think? Are the suggestions too complex? Are they useful?

—
Reply to this email directly or view it on GitHub
#174.

Mathieu Longtin
1-514-803-8977

misiek08 · 2016-02-28T17:21:30Z

Lua callbacks sounds just sexy. It will allow infinite features to be added.
If lua callbacks implementation will have multiple-callback or callback chain (calling next callback given as argument) it would be really great.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Suggestion] Move jobs between queues, change the job body, custom metadata for a better failure handling #174

[Suggestion] Move jobs between queues, change the job body, custom metadata for a better failure handling #174

Revisor commented Feb 28, 2016

mathieulongtin commented Feb 28, 2016

misiek08 commented Feb 28, 2016

[Suggestion] Move jobs between queues, change the job body, custom metadata for a better failure handling #174

[Suggestion] Move jobs between queues, change the job body, custom metadata for a better failure handling #174

Comments

Revisor commented Feb 28, 2016

mathieulongtin commented Feb 28, 2016

misiek08 commented Feb 28, 2016