-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix variable resolution in vectorized aggregation planning #7415
base: main
Are you sure you want to change the base?
Conversation
We didn't properly resolve INDEX_VARs in the output targetlist of DecompressChunk nodes, which are present when it uses a custom scan targetlist. Fix this by always working with the targetlist where these variables are resolved to uncompressed chunk variables, like we do during execution.
@gayyappan, @erimatnor: please review this pull request.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7415 +/- ##
==========================================
+ Coverage 80.06% 82.09% +2.02%
==========================================
Files 190 230 +40
Lines 37181 43112 +5931
Branches 9450 10838 +1388
==========================================
+ Hits 29770 35391 +5621
- Misses 2997 3396 +399
+ Partials 4414 4325 -89 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a | ||
(20 rows) | ||
|
||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this is testing. Is it just checking that the query doesn't fail (if it did fail prior to this fix)? Or is it testing that it gives correct output?
How can I know that the output (sum) is correct? Is there a non-compressed (regular) table I can compare with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing an aggregate function reference that has an expression that references a nested loop parameter.
I generated a reference by running the same query with vectorized aggregation disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the reference output be part of the test? Now the set
for turning off vector agg is a commented line so I cannot see the reference or verify that this is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You uncomment this line and this generates the reference in the output file -- all queries run with normal postgres aggregation and not vectorized aggregation. Then you comment it back and run the test again and check that nothing else changes. You do this once when changing the test, I already done this, the test output has the correct results generated by standard Postgres plan. This is the approach I use for some other tests as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.
Having this comparison of the outputs is good because it also easily captures future errors and regressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.
Yeah, I know, I don't like to use it because:
- The test runs more than twice slower
- Inconvenient to view the entire test reference, you have to do extra steps to open another file for this.
- The PG version is not always the same as our version, e.g. some vectorized functions have better numeric stability.
This is the stuff that I remember off the top of my head, probably there are more reasons. Why is it a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you also have to put the actual test queries into a separate file and run it with psql, so editing a test is also more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this comparison of the outputs is good because it also easily captures future errors and regressions.
The test I wrote also compares the outputs, only the PG output is fixed at the test editing time.
When you generate the output each time, you make it effectively compare the four different supported PG versions against each other. Not sure what's the benefit, probably you'll just run into some numeric stability change in PG and will have to painfully work around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this.
Yeah, I know, I don't like to use it because:
1. The test runs more than twice slower 2. Inconvenient to view the entire test reference, you have to do extra steps to open another file for this. 3. The PG version is not always the same as our version, e.g. some vectorized functions have better numeric stability.
This is the stuff that I remember off the top of my head, probably there are more reasons. Why is it a problem?
I am merely giving feedback on things I think would improve the test and avoid regression, as well as for my own understanding so that I don't just approve without understanding what is going on. Only now, after I asked, it is clear that the test output is in fact different from regular PG aggregates, as you admit. Even if it is not strictly wrong, I cannot verify this in the review. It gives me pause because it was neither documented nor clear from the test. At the very least, this could have been good information to provide in the test. Having different aggregate output also means we can't easily capture regressions and it requires someone to know that they need to manually enable and inspect the output when something changes, which I think you are currently the only person who knows how to do easily.
Ideally, our tests should be easy to understand and maintain also by others, this is the perspective I have. Is there some way we can improve the test to make it easier for others to understand the aspects above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only now, after I asked, it is clear that the test output is in fact different from regular PG aggregates, as you admit.
That's not in this test, that's in different ones where I also use this pattern. What regressions do you want to avoid? This is the usual "golden test", it runs some queries and compares their output against the one captured in the reference. Most our tests are like that. Here we also have a possibility to compare the reference against the analogous PG output by uncommenting a single line in this test. What should be improved here?
->expr); | ||
Assert(var->varno > 0); | ||
|
||
return (Node *) copyObject(var); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a copyObject() here but not in the other return
cases? Or, to ask it differently, should we docopyObject()
also in the other return of var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no practical consequences here, but it is idiomatic for the expression tree mutators to return a copy. I added copyObject into the second place as well.
Co-authored-by: Erik Nordström <[email protected]> Signed-off-by: Alexander Kuzmenkov <[email protected]>
@@ -21,23 +21,38 @@ select count(compress_chunk(x)) from show_chunks('pvagg') x; | |||
(1 row) | |||
|
|||
analyze pvagg; | |||
explain (costs off) | |||
-- Uncomment to generate reference | |||
--set timescaledb.enable_vectorized_aggregation to off; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is just for conveniently generating the reference using the standard Postgres plans whenever you want to change this test.
Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a | ||
(20 rows) | ||
|
||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the reference output be part of the test? Now the set
for turning off vector agg is a commented line so I cannot see the reference or verify that this is correct.
We didn't properly resolve INDEX_VARs in the output targetlist of DecompressChunk nodes, which are present when it uses a custom scan targetlist. Fix this by always working with the targetlist where these variables are resolved to uncompressed chunk variables, like we do during execution.
Fixes #7410 (probably, I couldn't reproduce the original issue).