Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix variable resolution in vectorized aggregation planning #7415

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Nov 5, 2024

We didn't properly resolve INDEX_VARs in the output targetlist of DecompressChunk nodes, which are present when it uses a custom scan targetlist. Fix this by always working with the targetlist where these variables are resolved to uncompressed chunk variables, like we do during execution.

Fixes #7410 (probably, I couldn't reproduce the original issue).

We didn't properly resolve INDEX_VARs in the output targetlist of
DecompressChunk nodes, which are present when it uses a custom scan
targetlist. Fix this by always working with the targetlist where these
variables are resolved to uncompressed chunk variables, like we do
during execution.
Copy link

github-actions bot commented Nov 5, 2024

@gayyappan, @erimatnor: please review this pull request.

Powered by pull-review

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 10 lines in your changes missing coverage. Please review.

Project coverage is 82.09%. Comparing base (59f50f2) to head (106a68f).
Report is 601 commits behind head on main.

Files with missing lines Patch % Lines
tsl/src/nodes/vector_agg/plan.c 79.06% 2 Missing and 7 partials ⚠️
tsl/src/nodes/vector_agg/exec.c 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7415      +/-   ##
==========================================
+ Coverage   80.06%   82.09%   +2.02%     
==========================================
  Files         190      230      +40     
  Lines       37181    43112    +5931     
  Branches     9450    10838    +1388     
==========================================
+ Hits        29770    35391    +5621     
- Misses       2997     3396     +399     
+ Partials     4414     4325      -89     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

@akuzm akuzm enabled auto-merge (squash) November 7, 2024 14:40
tsl/src/nodes/vector_agg/plan.c Outdated Show resolved Hide resolved
Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a
(20 rows)

select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this is testing. Is it just checking that the query doesn't fail (if it did fail prior to this fix)? Or is it testing that it gives correct output?

How can I know that the output (sum) is correct? Is there a non-compressed (regular) table I can compare with?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing an aggregate function reference that has an expression that references a nested loop parameter.

I generated a reference by running the same query with vectorized aggregation disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the reference output be part of the test? Now the set for turning off vector agg is a commented line so I cannot see the reference or verify that this is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You uncomment this line and this generates the reference in the output file -- all queries run with normal postgres aggregation and not vectorized aggregation. Then you comment it back and run the test again and check that nothing else changes. You do this once when changing the test, I already done this, the test output has the correct results generated by standard Postgres plan. This is the approach I use for some other tests as well.

Copy link
Contributor

@erimatnor erimatnor Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.

Having this comparison of the outputs is good because it also easily captures future errors and regressions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.

Yeah, I know, I don't like to use it because:

  1. The test runs more than twice slower
  2. Inconvenient to view the entire test reference, you have to do extra steps to open another file for this.
  3. The PG version is not always the same as our version, e.g. some vectorized functions have better numeric stability.

This is the stuff that I remember off the top of my head, probably there are more reasons. Why is it a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you also have to put the actual test queries into a separate file and run it with psql, so editing a test is also more complicated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this comparison of the outputs is good because it also easily captures future errors and regressions.

The test I wrote also compares the outputs, only the PG output is fixed at the test editing time.

When you generate the output each time, you make it effectively compare the four different supported PG versions against each other. Not sure what's the benefit, probably you'll just run into some numeric stability change in PG and will have to painfully work around it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.

Yeah, I know, I don't like to use it because:

1. The test runs more than twice slower

2. Inconvenient to view the entire test reference, you have to do extra steps to open another file for this.

3. The PG version is not always the same as our version, e.g. some vectorized functions have better numeric stability.

This is the stuff that I remember off the top of my head, probably there are more reasons. Why is it a problem?

I am merely giving feedback on things I think would improve the test and avoid regression, as well as for my own understanding so that I don't just approve without understanding what is going on. Only now, after I asked, it is clear that the test output is in fact different from regular PG aggregates, as you admit. Even if it is not strictly wrong, I cannot verify this in the review. It gives me pause because it was neither documented nor clear from the test. At the very least, this could have been good information to provide in the test. Having different aggregate output also means we can't easily capture regressions and it requires someone to know that they need to manually enable and inspect the output when something changes, which I think you are currently the only person who knows how to do easily.

Ideally, our tests should be easy to understand and maintain also by others, this is the perspective I have. Is there some way we can improve the test to make it easier for others to understand the aspects above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only now, after I asked, it is clear that the test output is in fact different from regular PG aggregates, as you admit.

That's not in this test, that's in different ones where I also use this pattern. What regressions do you want to avoid? This is the usual "golden test", it runs some queries and compares their output against the one captured in the reference. Most our tests are like that. Here we also have a possibility to compare the reference against the analogous PG output by uncommenting a single line in this test. What should be improved here?

->expr);
Assert(var->varno > 0);

return (Node *) copyObject(var);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a copyObject() here but not in the other return cases? Or, to ask it differently, should we docopyObject() also in the other return of var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no practical consequences here, but it is idiomatic for the expression tree mutators to return a copy. I added copyObject into the second place as well.

@@ -21,23 +21,38 @@ select count(compress_chunk(x)) from show_chunks('pvagg') x;
(1 row)

analyze pvagg;
explain (costs off)
-- Uncomment to generate reference
--set timescaledb.enable_vectorized_aggregation to off;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just for conveniently generating the reference using the standard Postgres plans whenever you want to change this test.

Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a
(20 rows)

select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the reference output be part of the test? Now the set for turning off vector agg is a commented line so I cannot see the reference or verify that this is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: "Aggregated Compressed Column Not Found" Error on Aggregation Query in TimescaleDB
4 participants