EPIC: scan function #4671

rockstar · 2022-04-13T20:00:34Z

After talking about the potential for a new/improved stateChanges function, the working group came up with a scan function proposal that we'd like to create. scan will have the following signature:

builtin scan : (<-tables: stream[A], fn: (accumulator: B, element: A) => B, init: A) => stream[B]

This function was heavily inspired by the scanl (and family) functions from Haskell. Here's how it could be used:

import "experimental/array"

testdata = array.from(rows:
  {_time: 2021-01-01T00:00:00Z, _value: 100, state: "crit"},
  {_time: 2021-01-01T00:01:00Z, _value: 100, state: "crit"},
  {_time: 2021-01-01T00:02:00Z, _value: 80, state: "warn"},
  {_time: 2021-01-01T00:03:00Z, _value: 82, state: "warn"},
  {_time: 2021-01-01T00:04:00Z, _value: 80, state: "warn"},
  {_time: 2021-01-01T00:05:00Z, _value: 52, state: "ok"},
  {_time: 2021-01-01T00:06:00Z, _value: 50, state: "ok"},
])

testdata
  |> scan(fn: (acc, record) => ({record with stateChanged: acc.state != record.state}))

This would emit a table with a new boolean field stateChanged that is true whenever the state changes. This would mean that a stateChanges function could look something like this, using scan.

stateChanges = (<-tables, value) => {
  (if exists value then tables
    |> prepend(value: value) else tables)
    |> scan(fn: (acc, ele) => ({ele with stateChanged: acc.state != ele.state}))
}

Questions:

The init parameter's use is still unclear. To specify it would mean knowing a lot more about data shape than one may know. What does the init look like? Does it have to be explicit? Could/should we create two functions, one that takes an init and one that doesn't?
How does scan affect the group key of a table? Does it work like reduce, where it carries the group key with it?

The text was updated successfully, but these errors were encountered:

samhld · 2022-06-07T21:13:39Z

+1

sanderson · 2022-10-26T22:35:57Z

Another use case here would be detecting phases in data. For example, if a counter resets, increment the phase:

testdata = array.from(rows:
  {_time: 2021-01-01T00:00:00Z, _value: 0},
  {_time: 2021-01-01T00:01:00Z, _value: 23},
  {_time: 2021-01-01T00:02:00Z, _value: 50},
  {_time: 2021-01-01T00:03:00Z, _value: 0},
  {_time: 2021-01-01T00:04:00Z, _value: 18},
  {_time: 2021-01-01T00:05:00Z, _value: 32},
  {_time: 2021-01-01T00:06:00Z, _value: 0},
])

testdata
  |> scan(fn: (accumulator, r) => ({r with phase: if accumulator._value <= r._value then accumulator.phase else accumulator.phase + 1 }))

The expected output would look something like:

_time	_value	phase
2021-01-01T00:00:00Z	0	0
2021-01-01T00:01:00Z	23	0
2021-01-01T00:02:00Z	50	0
2021-01-01T00:03:00Z	0	1
2021-01-01T00:04:00Z	18	1
2021-01-01T00:05:00Z	32	1
2021-01-01T00:06:00Z	0	2

The init parameter's use is still unclear. To specify it would mean knowing a lot more about data shape than one may know. What does the init look like? Does it have to be explicit? Could/should we create two functions, one that takes an init and one that doesn't?

The init's purpose is to act as the accumulator for the first row of each table. Without an init, there is nothing to compare the first row to. I'd love to have some way for the init to be implicit, but with the current type implementation, I don't know how it would work.

What if there was a concept of a "table record"—a default record with all group key columns populated and all non-group-key columns set as null. You could then extend the table record for the init. Something like:

init: {tableRecord with _value: 0, phase: 0}

With that in place, the function call could look like this:

testdata
    |> scan(
        init: {tableRecord with _value: 0, phase: 0},
        fn: (accumulator, r) => ({
            r with
            phase: if accumulator._value <= r._value then accumulator.phase else accumulator.phase + 1,
            _value: r._value,
        })
    )

How does scan affect the group key of a table? Does it work like reduce, where it carries the group key with it?

I would say yes, it carries the group key with it. It would be up to the user to group by any columns added by the transformation.

sanderson · 2022-11-02T21:46:37Z

To add to this, I think there should also be an array.scan function that provides this functionality for arrays. For example:

import "array"

a = [1, 2, 3, 4]
b = 20

c = array.scan(
    arr: a,
    init: b,
    fn: (x, acc) => x + acc
)

// c = [21,23,26,30]

The signature of array.scan would be something like:

builtin scan : (<-arr: [A], fn: (acc: B, x: A) => B, init: A) => [B]

UlrichThiess · 2023-03-10T16:54:58Z

This proposal is urgently needed.

I have described this requirement here: https://community.influxdata.com/t/number-taxi-rides/28939

UlrichThiess · 2023-03-16T22:27:30Z

Mybe there is no need for an scan function?

I asked ChatGPT for help. This is our solution: (ChatGPT pushes me in the right direction.)

// Helper function to return an integer from a table
getFieldValue = (tables=<-, field) => {
  extract = tables
    |> last() // shrink table to one row
    |> findColumn(fn: (key) => key._field == field, column: "_value")

  return if length(arr: extract) == 0 then 0 else extract[0] // return 0 if there is no table else last value
}

// We need the last tripId as integer
lastId = from(bucket: "obd2")
  |> range(start: 2023-02-24T21:48:57.8Z, stop: 2023-02-24T22:54:46.955Z)
  |> filter(fn: (r) => r._measurement == "Taxi" and r._field == "tripId" and r._value != 0) // search for tripId
  |> last() // shrink table to one row
  |> getFieldValue(field: "_field" )

rpm_data = from(bucket: "obd2")
  |> range(start: 2023-02-24T21:48:57.8Z, stop: 2023-02-24T22:54:46.955Z)
  |> filter(fn: (r) => r._measurement == "Taxi" and r._field == "EngineRPM")
  |> map(fn: (r) => ({ r with tmp: if r._value > 0 then 1 else 0 })) // create helperfield tmp
  |> derivative(unit: 1s, nonNegative: true, columns: ["tmp"]) // only the changes from 0 to not 0
  |> map(fn: (r) => ({ r with tripId: int(v: r.tmp) })) // create tripIp with integer of tmp
  |> drop(columns: ["tmp"]) // remove helperfield tmp
  |> cumulativeSum(columns: ["tripId"])
  |> map(fn: (r) => ({ r with tripId: if r._value > 0 then r.tripId + lastId else 0 }))
  |> yield(name: "rpm")

I will make a short trigger of that and do more tests.

github-actions · 2023-05-31T01:58:37Z

This issue has had no recent activity and will be closed soon.

paulwer · 2023-06-27T07:27:29Z

any updates on this?

UlrichThiess · 2023-06-27T07:29:54Z

This can be closed, as I have found a solution.

paulwer · 2023-06-27T07:41:17Z

cannot reproduce it for our stateChanged use-case, any further explaination on your example?
if its working there has to be a documentation for this use case (from my point of view)

UlrichThiess · 2023-06-27T09:35:21Z

Paste this into the InfluxDB 2 DataExplorer in the Script Editor Window.

import "array"

testdata = array.from(rows: [
  {_time: 2021-01-01T00:00:00Z, _value: 0},
  {_time: 2021-01-01T00:01:00Z, _value: 23},
  {_time: 2021-01-01T00:02:00Z, _value: 50},
  {_time: 2021-01-01T00:03:00Z, _value: 0},
  {_time: 2021-01-01T00:04:00Z, _value: 18},
  {_time: 2021-01-01T00:05:00Z, _value: 32},
  {_time: 2021-01-01T00:06:00Z, _value: 0}
])

testdata

  // TripId
  |> map(fn: (r) => ({r with TripId: if r._value > 0 then 1 else 0}))
  |> derivative(unit: 1m, nonNegative: true, columns: ["TripId"], initialZero: true)
  |> cumulativeSum(columns: ["TripId"])
  |> map(fn: (r) => ({r with TripId: if r._value > 0 then r.TripId else 0.0}))
//  |> filter(fn: (r) => r.TripId != 0)

  |> yield()

paulwer · 2023-06-27T11:34:02Z

i guess this wont cover cases, where you want to only get values and their time, when the value changes.
f.ex. statuses, which are saved as boolean => get only value changes => determine, which warnings were active within a timeframe

UlrichThiess · 2023-06-27T12:03:40Z

That sounds easier. However, I don't know the exact requirement. That discussion should be elsewhere, not here. Paul, if I can help, speak to me directly.

github-actions · 2023-08-27T01:38:19Z

This issue has had no recent activity and will be closed soon.

github-actions · 2023-11-28T01:41:46Z

This issue has had no recent activity and will be closed soon.

github-actions · 2024-01-28T01:39:10Z

This issue has had no recent activity and will be closed soon.

github-actions · 2024-03-29T01:36:36Z

This issue has had no recent activity and will be closed soon.

github-actions · 2024-05-29T01:44:05Z

This issue has had no recent activity and will be closed soon.

github-actions · 2024-07-29T01:47:05Z

This issue has had no recent activity and will be closed soon.

jgladch · 2024-10-03T01:50:37Z

Paste this into the InfluxDB 2 DataExplorer in the Script Editor Window.

import "array"

testdata = array.from(rows: [
  {_time: 2021-01-01T00:00:00Z, _value: 0},
  {_time: 2021-01-01T00:01:00Z, _value: 23},
  {_time: 2021-01-01T00:02:00Z, _value: 50},
  {_time: 2021-01-01T00:03:00Z, _value: 0},
  {_time: 2021-01-01T00:04:00Z, _value: 18},
  {_time: 2021-01-01T00:05:00Z, _value: 32},
  {_time: 2021-01-01T00:06:00Z, _value: 0}
])

testdata

  // TripId
  |> map(fn: (r) => ({r with TripId: if r._value > 0 then 1 else 0}))
  |> derivative(unit: 1m, nonNegative: true, columns: ["TripId"], initialZero: true)
  |> cumulativeSum(columns: ["TripId"])
  |> map(fn: (r) => ({r with TripId: if r._value > 0 then r.TripId else 0.0}))
//  |> filter(fn: (r) => r.TripId != 0)

  |> yield()

Thanks very much for this @UlrichThiess It unblocked me 🍻

github-actions · 2024-12-02T01:59:46Z

This issue has had no recent activity and will be closed soon.

rockstar mentioned this issue Apr 13, 2022

Add stateChanges function to the standard library #3582

Closed

sanderson mentioned this issue Nov 10, 2022

FluxQL needs a transform function #5340

Closed

github-actions bot added the no-issue-activity label May 31, 2023

sanderson removed the no-issue-activity label May 31, 2023

github-actions bot added the no-issue-activity label Aug 27, 2023

github-actions bot closed this as completed Sep 3, 2023

sanderson reopened this Sep 28, 2023

github-actions bot removed the no-issue-activity label Sep 29, 2023

github-actions bot added the no-issue-activity label Nov 28, 2023

sanderson removed the no-issue-activity label Nov 28, 2023

github-actions bot added the no-issue-activity label Jan 28, 2024

sanderson removed the no-issue-activity label Jan 28, 2024

github-actions bot added the no-issue-activity label Mar 29, 2024

sanderson removed the no-issue-activity label Mar 29, 2024

github-actions bot added the no-issue-activity label May 29, 2024

sanderson removed the no-issue-activity label May 29, 2024

github-actions bot added the no-issue-activity label Jul 29, 2024

github-actions bot closed this as completed Aug 6, 2024

sanderson reopened this Aug 6, 2024

sanderson removed the no-issue-activity label Aug 6, 2024

github-actions bot added the no-issue-activity label Dec 2, 2024

sanderson removed the no-issue-activity label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: scan function #4671

EPIC: scan function #4671

rockstar commented Apr 13, 2022

samhld commented Jun 7, 2022

sanderson commented Oct 26, 2022 •

edited

Loading

sanderson commented Nov 2, 2022

UlrichThiess commented Mar 10, 2023

UlrichThiess commented Mar 16, 2023

github-actions bot commented May 31, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

github-actions bot commented Aug 27, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Jan 28, 2024

github-actions bot commented Mar 29, 2024

github-actions bot commented May 29, 2024

github-actions bot commented Jul 29, 2024

jgladch commented Oct 3, 2024

github-actions bot commented Dec 2, 2024

EPIC: scan function #4671

EPIC: scan function #4671

Comments

rockstar commented Apr 13, 2022

samhld commented Jun 7, 2022

sanderson commented Oct 26, 2022 • edited Loading

sanderson commented Nov 2, 2022

UlrichThiess commented Mar 10, 2023

UlrichThiess commented Mar 16, 2023

github-actions bot commented May 31, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

paulwer commented Jun 27, 2023

UlrichThiess commented Jun 27, 2023

github-actions bot commented Aug 27, 2023

github-actions bot commented Nov 28, 2023

github-actions bot commented Jan 28, 2024

github-actions bot commented Mar 29, 2024

github-actions bot commented May 29, 2024

github-actions bot commented Jul 29, 2024

jgladch commented Oct 3, 2024

github-actions bot commented Dec 2, 2024

sanderson commented Oct 26, 2022 •

edited

Loading