-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec fill/Interpolate function #436
Comments
Hi, Thank you in advance Kindest regards |
I'm struggling with the measurement-series gaps and was looking for a solution. |
We have started with a simple function for linear interpolation see https://github.com/influxdata/flux/blob/master/stdlib/interpolate/interpolate.flux Public docs are incoming |
We're in the need of having backward fill interpolation. Is this planned? |
linear interpolation is not enough. The status of the feature should be updated |
This issue has had no recent activity and will be closed soon. |
From ifql created by aanthony1243 : influxdata/ifql#255
Interpolate Proposed as a replacement for Fill()
The basic notion of fill() in InfluxQL is to fill in missing values under certain conditions, most often in the case of an empty time window or group. A more general notion is that of interpolate() which should solve the existing use cases, and also provide more advanced features. The end of this text includes some concrete examples of InfluxQL usage for the fill() function.
initial writeup of interpolate design, including a discussion of the Window() operation can be found here: https://docs.google.com/document/d/1C4ELMRJblvy7UsyBF_o3TGVfGrydChKET2LdGOYu53k/edit?usp=sharing
from the above doc:
Types of interpolation:
Interpolation may be defined on a NxM matrix as filling in the null/missing values on that matrix. In influxdb, a matrix can be materialized as a collection of related series where the row-indices are the series timestamp, and the column indices are series ID:
We’ll initially define two types of interpolation, one that operates only on rows of the matrix, and one that operates only on the columns of the matrix.
Column-wise interpolation
A column-wise interpolation will focus on a single series, so that other series in the data matrix are not considered. A few types of interpolation on a column to consider are:
Row-wise interpolation
Row-wise interpolation isn’t necessarily different from column-wise interpolation. In principle, a row-wise interpolation can be achieved by applying a column-wise interpolation to the transposed data matrix. However, in InfluxDB, there are some limitations because while the rows of the data matrix in this context are numeric and sorted by time, the columns of the matrix are categorical and may appear in any arbitrary order. With this in mind, we consider the following row-wise interpolations valid:
If it were possible to assign a total ordering on the rows of the matrix, then the remaining interpolations may be well-defined:
Finally, linear interpolation would be difficult to define generally on a row. If some notion of numeric distance between row values may be defined on the tags, then some form of linear interpolation may be applied. Currently, the language has little support for this other than presuming a fixed unit of distance between two adjacent columns in the matrix.
Column-Wise Interpolation Operator
The interpolation function on a column can be simplified into two parts:
Identify each missing value in the series.
Determine a replacement for that missing value.
The most convenient scenario for interpolation will be where a timestamped NULL value exists on the series. Later, we will lift this assumption, but given that a series contains timestamped nulls, a possible interpolate function would be:
interpolate(table=<-, nullFn, fillFn)
Where nullFn is a function defined on a series value that returns true/false if a value should be considered a candidate for replacement, and fillFn is a function that generates a value to insert in place of the NULL value.
This approach would be the most general, but possibly inefficient unless the fillFn can be optimized such that it can be computed quickly. In general, interpolation is computed using known values that are near the NULL value. We can cache these values into a moving window as we scan the series, and build in the various functions that may be desirable for interpolation:
interpolate(table=<-, nullFn, stepsPrev, stepsNext, fillType=”prev”)
Where stepsPrev and stepsNext indicate the boundary of the window to each side of the NULL value, and the fill Type may be one of:
Appendix: InfluxQL examples, with Flux adaptations:
External requirements
Null values: we need to have some representation of null/missing values for a row so that we can identify where to apply interpolation.
Many existing InfluxQL queries use grouping/windowing to segment the data, and then a Fill() operation to insert a default value for any empty segments. To get the most out of this feature, we need to make sure that our Window() function outputs empty groups either by default or by argument:
|> window(every: 10m, keepEmpty: true)
NOTE: depending on how we implement Window() example 1 may not require interpolate() at all, if we did something like:
|> window(every:10m, emptyValue: 0
Example 1:
Many chronograf queries require data for each point on an axis. If the axis is populated with the results of a GROUP BY, empty groups must get a default value. Exmaple:
Example 2:
it's not well-defined what happens here, if anything. But there are some queries on the cloud monitor that use Fill(0) independently of a GROUP BY:
It's most possible that this query started out as a continuous query, where fill() is defined in some contexts in this manner. In the query above, it's not 100% clear what's meant to be filled. This use-case may not require any changes unless we support continuous-Flux queries.
The text was updated successfully, but these errors were encountered: