Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Ensure public API allows hierarchical optimizations #29016

Open
Tracked by #28892
TylerHelmuth opened this issue Nov 7, 2023 · 5 comments
Open
Tracked by #28892

[pkg/ottl] Ensure public API allows hierarchical optimizations #29016

TylerHelmuth opened this issue Nov 7, 2023 · 5 comments
Labels
never stale Issues marked with this label will be never staled and automatically removed pkg/ottl priority:p2 Medium

Comments

@TylerHelmuth
Copy link
Member

TylerHelmuth commented Nov 7, 2023

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

Currently OTTL expects the following statement to be run for each log:

resource.attributes["a"] == "b" and body == "something"

For a group of logs in a resource, if resource.attributes["a"] == "b" is false, we do not need to evaluate the condition for the remaining logs, since they all share a resource and all would fail the condition.

We need to ensure OTTL's API allows for this kind of optimization without a breaking change.

@TylerHelmuth TylerHelmuth added enhancement New feature or request needs triage New item requiring triage priority:p2 Medium pkg/ottl and removed enhancement New feature or request needs triage New item requiring triage labels Nov 7, 2023
Copy link
Contributor

github-actions bot commented Feb 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 15, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
@evan-bradley evan-bradley reopened this Jun 14, 2024
@evan-bradley evan-bradley added never stale Issues marked with this label will be never staled and automatically removed and removed Stale closed as inactive labels Jun 14, 2024
@evan-bradley
Copy link
Contributor

Reopening; This could be a meaningful speed increase and I'd like to determine that we don't want this if we're going to close it.

@bacherfl
Copy link
Contributor

bacherfl commented Nov 22, 2024

I was just looking into this and worked on a PoC of how e.g. the transform processor could be improved to solve the issue described above (in a non-breaking way):
My idea was to build up on the concept of having global conditions which are executed for each statement in a ContextStatements item:

Conditions []string `mapstructure:"conditions"`

Here, we could add additional global conditions, one for conditions on the resource, and one for the scope, i.e. something like:

type ContextStatements struct {
	Context            ContextID `mapstructure:"context"`
	Conditions         []string  `mapstructure:"conditions"`
	ResourceConditions []string  `mapstructure:"resource_conditions"`
	ScopeConditions    []string  `mapstructure:"scope_conditions"`
	Statements         []string  `mapstructure:"statements"`
}

This would then allow users to move statements with the same checks for e.g. specific resource attributes into the same group, and the transform processor, while iterating over e.g. a list of resource logs, could, in case the resource conditions are not fulfilled, immediately continue with the next resource log without iterating over all scopeLogs/logRecords of that resource:

func (l logStatements) ConsumeLogs(ctx context.Context, ld plog.Logs) error {
	for i := 0; i < ld.ResourceLogs().Len(); i++ {
		rlogs := ld.ResourceLogs().At(i)

		resourceCtx := ottlresource.NewTransformContext(rlogs.Resource(), plog.NewResourceLogs())
		rCondition, err := l.ResourceGlobalExpr.Eval(ctx, resourceCtx)
		if err != nil {
			return err
		}
		if !rCondition {
			continue
		}

		for j := 0; j < rlogs.ScopeLogs().Len(); j++ {
			// omitting this part for readability
		}
	}
	return nil
}

I did some benchmark tests for my PoC, where I compared the performance of having a resource attribute condition in the where clause of the statement (which is evaluated for each log record), vs using the global resource_conditions - given a resource log with 1000 log records I got the following results:

Benchmark_ConsumeLogs/non_matching_resource_condition
Benchmark_ConsumeLogs/non_matching_resource_condition-10                  	    9868	    116931 ns/op
Benchmark_ConsumeLogs/non_matching_resource_condition_in_where_clause
Benchmark_ConsumeLogs/non_matching_resource_condition_in_where_clause-10  	    2176	    528010 ns/op

So the use of the resource_conditions resulted in a ~4.5 times faster processing of the test data for that particular scenario. Of course, mileage may vary here and is dependent on the structure of the processed signals, but potentially this could also translate to some real performance improvements if the context statements are adapted appropriately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
never stale Issues marked with this label will be never staled and automatically removed pkg/ottl priority:p2 Medium
Projects
None yet
Development

No branches or pull requests

3 participants