-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow batch() option to not process empty windows? #75
Comments
Doesn't feel like it's worth the addition... though maybe that's just because I haven't had to live with it as it is :-) I imagine some uses would need to know that nothing was captured during the last N sec window and react as appropriate for the app (e.g., log, publish, generate a special tuple). Seems clear/natural to expect the Seems simple enough for uses that don't care about that case, if it doesn't fallout naturally from their processing, to just make the check. e.g.,
In a common case where "process the batch" involves a math3 analytic (e.g., SUM), maybe we define those analytics to return null if the input list is empty thereby eliminating having to write an explicit check? |
I somewhat agree with @dlaboss , but while it's simple to have the isEmpty() it doesn't help reduce cpu cost of potentially many schedules of a task that ends up doing nothing. Maybe it's a YAGNI, and maybe it's more than just a boolean, but potentially a mode, e.g.:
Maybe also it's a property of the window, so it applies to any window operation. If it's empty then do nothing for any window operation. This might also be easier to add in the future without breaking compatibility or having to have multiple overloads of |
I agree that implementing this as an empty partition policy is easy to add without breaking compatibility, but I think of handling empty windows as an attribute of the aggregate function rather than of the window. Assume I want to support multiple aggregate functions triggered by the same window. To do that without extending Quarks, I have to create identical windows for each aggregate function. If this becomes unacceptable (e.g. high memory consumption if windows are large), and the alternative is to update the current window implementation to support a list of aggregates per window, then a per-window policy would then apply to all the aggregates. |
@vdogaru Are you saying that is the case now, or if the empty partition policy was per window? |
I am saying that adding an empty partition policy per window would force all the aggregates triggered by that window to share the window's policy. |
Right, but that doesn't force each aggregation to have its own window. Only ones with different policies. |
Currently, each time TWindow.aggregate is called, it creates a new Aggregate<T,U,K> oplet which contains its own Window. So in that sense, each aggregation does have its own window. |
@wmarshall484 A mere implementation detail, from the api they use a single window. The implementation could be changed to have a single oplet for a window. |
Another situation that may not matter for Quarks being on the edge, but when there are thousands of partitions processing empty partitions can add up. |
The current batch implementation will process empty windows when the window is the last N seconds.
I wonder if this is the desired or expected behaviour. I see batching as breaking a stream into batches of tuples, thus processing an empty one doesn't seem to make much sense.
Batch could have an option to allow processing of empty windows, defaulting to false, or just not support it.
Throughts?
The text was updated successfully, but these errors were encountered: