Allow batch() option to not process empty windows? #75

ddebrunner · 2016-03-02T16:45:08Z

The current batch implementation will process empty windows when the window is the last N seconds.

I wonder if this is the desired or expected behaviour. I see batching as breaking a stream into batches of tuples, thus processing an empty one doesn't seem to make much sense.

Batch could have an option to allow processing of empty windows, defaulting to false, or just not support it.

Throughts?

dlaboss · 2016-03-02T17:21:45Z

Doesn't feel like it's worth the addition... though maybe that's just because I haven't had to live with it as it is :-)

I imagine some uses would need to know that nothing was captured during the last N sec window and react as appropriate for the app (e.g., log, publish, generate a special tuple).

Seems clear/natural to expect the batcher is called every N sec with whatever tuples arrived during that time, hence the list may be empty.

Seems simple enough for uses that don't care about that case, if it doesn't fallout naturally from their processing, to just make the check. e.g.,

TStream s = s.last(10, SECONDS).batch(
                                      tuples -> { if (tuples.isEmpty()) return null;
                                      ... process the batch
                                      };

In a common case where "process the batch" involves a math3 analytic (e.g., SUM), maybe we define those analytics to return null if the input list is empty thereby eliminating having to write an explicit check?

ddebrunner · 2016-03-02T17:38:15Z

I somewhat agree with @dlaboss , but while it's simple to have the isEmpty() it doesn't help reduce cpu cost of potentially many schedules of a task that ends up doing nothing.

Maybe it's a YAGNI, and maybe it's more than just a boolean, but potentially a mode, e.g.:

Always process empty partition
Only process the first in a sequence of empty partition.
Never process empty partition
...

Maybe also it's a property of the window, so it applies to any window operation. If it's empty then do nothing for any window operation. This might also be easier to add in the future without breaking compatibility or having to have multiple overloads of batch().

vdogaru · 2016-03-02T19:05:36Z

I agree that implementing this as an empty partition policy is easy to add without breaking compatibility, but I think of handling empty windows as an attribute of the aggregate function rather than of the window.

Assume I want to support multiple aggregate functions triggered by the same window. To do that without extending Quarks, I have to create identical windows for each aggregate function. If this becomes unacceptable (e.g. high memory consumption if windows are large), and the alternative is to update the current window implementation to support a list of aggregates per window, then a per-window policy would then apply to all the aggregates.

ddebrunner · 2016-03-02T19:14:07Z

@vdogaru
?> To do that without extending Quarks, I have to create identical windows for each aggregate function.

Are you saying that is the case now, or if the empty partition policy was per window?

vdogaru · 2016-03-02T19:28:11Z

I am saying that adding an empty partition policy per window would force all the aggregates triggered by that window to share the window's policy.

ddebrunner · 2016-03-02T19:30:05Z

Right, but that doesn't force each aggregation to have its own window. Only ones with different policies.

wmarshall484 · 2016-03-02T20:20:34Z

that doesn't force each aggregation to have its own window

Currently, each time TWindow.aggregate is called, it creates a new Aggregate<T,U,K> oplet which contains its own Window. So in that sense, each aggregation does have its own window.

ddebrunner · 2016-03-02T20:43:54Z

@wmarshall484 A mere implementation detail, from the api they use a single window.

The implementation could be changed to have a single oplet for a window.

ddebrunner · 2016-03-02T23:57:57Z

Another situation that may not matter for Quarks being on the edge, but when there are thousands of partitions processing empty partitions can add up.

ddebrunner added the question label Mar 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow batch() option to not process empty windows? #75

Allow batch() option to not process empty windows? #75

ddebrunner commented Mar 2, 2016

dlaboss commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

vdogaru commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

vdogaru commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

wmarshall484 commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

Allow batch() option to not process empty windows? #75

Allow batch() option to not process empty windows? #75

Comments

ddebrunner commented Mar 2, 2016

dlaboss commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

vdogaru commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

vdogaru commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

wmarshall484 commented Mar 2, 2016

ddebrunner commented Mar 2, 2016

ddebrunner commented Mar 2, 2016