-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix variable resolution in vectorized aggregation planning #7415
base: main
Are you sure you want to change the base?
Changes from all commits
4dfa5c7
bc30ab8
69ed75d
da0e3de
eecd1dd
edaa7cb
eddcddc
a5dfada
d11fec5
92c6fd3
d5ad761
1992b8e
c6eb880
0c59d9a
e71a015
31bceb9
d4c97f0
fa9c11f
106a68f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Fixes: #7410 "aggregated compressed column not found" error on aggregation query. | ||
Thanks: @uasiddiqi for reporting the "aggregated compressed column not found" error. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,23 +21,42 @@ select count(compress_chunk(x)) from show_chunks('pvagg') x; | |
(1 row) | ||
|
||
analyze pvagg; | ||
explain (costs off) | ||
-- The reference for this test is generated using the standard Postgres | ||
-- aggregation. When you change this test, recheck the results against the | ||
-- Postgres aggregation by uncommenting the below GUC. | ||
-- set timescaledb.enable_vectorized_aggregation to off; | ||
explain (verbose, costs off) | ||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a) from pvagg where s = x) xx; | ||
QUERY PLAN | ||
--------------------------------------------------------------------------- | ||
QUERY PLAN | ||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
Nested Loop | ||
-> Function Scan on unnest x | ||
Output: x.x, (sum(pvagg.a)) | ||
-> Function Scan on pg_catalog.unnest x | ||
Output: x.x | ||
Function Call: unnest('{0,1,2}'::integer[]) | ||
-> Finalize Aggregate | ||
-> Custom Scan (ChunkAppend) on pvagg | ||
Output: sum(pvagg.a) | ||
-> Custom Scan (ChunkAppend) on public.pvagg | ||
Output: (PARTIAL sum(pvagg.a)) | ||
Startup Exclusion: false | ||
Runtime Exclusion: true | ||
-> Custom Scan (VectorAgg) | ||
-> Custom Scan (DecompressChunk) on _hyper_1_1_chunk | ||
-> Seq Scan on compress_hyper_2_3_chunk | ||
Filter: (s = x.x) | ||
Output: (PARTIAL sum(_hyper_1_1_chunk.a)) | ||
Grouping Policy: all compressed batches | ||
-> Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_1_1_chunk | ||
Output: _hyper_1_1_chunk.a | ||
-> Seq Scan on _timescaledb_internal.compress_hyper_2_3_chunk | ||
Output: compress_hyper_2_3_chunk._ts_meta_count, compress_hyper_2_3_chunk.s, compress_hyper_2_3_chunk._ts_meta_min_1, compress_hyper_2_3_chunk._ts_meta_max_1, compress_hyper_2_3_chunk.a | ||
Filter: (compress_hyper_2_3_chunk.s = x.x) | ||
-> Custom Scan (VectorAgg) | ||
-> Custom Scan (DecompressChunk) on _hyper_1_2_chunk | ||
-> Seq Scan on compress_hyper_2_4_chunk | ||
Filter: (s = x.x) | ||
(12 rows) | ||
Output: (PARTIAL sum(_hyper_1_2_chunk.a)) | ||
Grouping Policy: all compressed batches | ||
-> Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_1_2_chunk | ||
Output: _hyper_1_2_chunk.a | ||
-> Seq Scan on _timescaledb_internal.compress_hyper_2_4_chunk | ||
Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a | ||
Filter: (compress_hyper_2_4_chunk.s = x.x) | ||
(27 rows) | ||
|
||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a) from pvagg where s = x) xx; | ||
x | sum | ||
|
@@ -47,4 +66,48 @@ select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a) from pvagg | |
2 | 1498500 | ||
(3 rows) | ||
|
||
explain (verbose, costs off) | ||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx; | ||
QUERY PLAN | ||
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ||
Nested Loop | ||
Output: x.x, (sum((_hyper_1_1_chunk.a + x.x))) | ||
-> Function Scan on pg_catalog.unnest x | ||
Output: x.x | ||
Function Call: unnest('{0,1,2}'::integer[]) | ||
-> Finalize Aggregate | ||
Output: sum((_hyper_1_1_chunk.a + x.x)) | ||
-> Append | ||
-> Partial Aggregate | ||
Output: PARTIAL sum((_hyper_1_1_chunk.a + x.x)) | ||
-> Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_1_1_chunk | ||
Output: _hyper_1_1_chunk.a | ||
-> Seq Scan on _timescaledb_internal.compress_hyper_2_3_chunk | ||
Output: compress_hyper_2_3_chunk._ts_meta_count, compress_hyper_2_3_chunk.s, compress_hyper_2_3_chunk._ts_meta_min_1, compress_hyper_2_3_chunk._ts_meta_max_1, compress_hyper_2_3_chunk.a | ||
-> Partial Aggregate | ||
Output: PARTIAL sum((_hyper_1_2_chunk.a + x.x)) | ||
-> Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_1_2_chunk | ||
Output: _hyper_1_2_chunk.a | ||
-> Seq Scan on _timescaledb_internal.compress_hyper_2_4_chunk | ||
Output: compress_hyper_2_4_chunk._ts_meta_count, compress_hyper_2_4_chunk.s, compress_hyper_2_4_chunk._ts_meta_min_1, compress_hyper_2_4_chunk._ts_meta_max_1, compress_hyper_2_4_chunk.a | ||
(20 rows) | ||
|
||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a + x) from pvagg) xx; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what this is testing. Is it just checking that the query doesn't fail (if it did fail prior to this fix)? Or is it testing that it gives correct output? How can I know that the output (sum) is correct? Is there a non-compressed (regular) table I can compare with? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is testing an aggregate function reference that has an expression that references a nested loop parameter. I generated a reference by running the same query with vectorized aggregation disabled. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't the reference output be part of the test? Now the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You uncomment this line and this generates the reference in the output file -- all queries run with normal postgres aggregation and not vectorized aggregation. Then you comment it back and run the test again and check that nothing else changes. You do this once when changing the test, I already done this, the test output has the correct results generated by standard Postgres plan. This is the approach I use for some other tests as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pattern we have for this is to generate two output files and do a diff between them in the test. There are examples in other tests how to do this. Having this comparison of the outputs is good because it also easily captures future errors and regressions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, I know, I don't like to use it because:
This is the stuff that I remember off the top of my head, probably there are more reasons. Why is it a problem? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, you also have to put the actual test queries into a separate file and run it with psql, so editing a test is also more complicated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The test I wrote also compares the outputs, only the PG output is fixed at the test editing time. When you generate the output each time, you make it effectively compare the four different supported PG versions against each other. Not sure what's the benefit, probably you'll just run into some numeric stability change in PG and will have to painfully work around it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am merely giving feedback on things I think would improve the test and avoid regression, as well as for my own understanding so that I don't just approve without understanding what is going on. Only now, after I asked, it is clear that the test output is in fact different from regular PG aggregates, as you admit. Even if it is not strictly wrong, I cannot verify this in the review. It gives me pause because it was neither documented nor clear from the test. At the very least, this could have been good information to provide in the test. Having different aggregate output also means we can't easily capture regressions and it requires someone to know that they need to manually enable and inspect the output when something changes, which I think you are currently the only person who knows how to do easily. Ideally, our tests should be easy to understand and maintain also by others, this is the perspective I have. Is there some way we can improve the test to make it easier for others to understand the aspects above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's not in this test, that's in different ones where I also use this pattern. What regressions do you want to avoid? This is the usual "golden test", it runs some queries and compares their output against the one captured in the reference. Most our tests are like that. Here we also have a possibility to compare the reference against the analogous PG output by uncommenting a single line in this test. What should be improved here? |
||
x | sum | ||
---+--------- | ||
0 | 1998000 | ||
1 | 1999998 | ||
2 | 2001996 | ||
(3 rows) | ||
|
||
-- The plan for this query differs after PG16, x is not used as grouping key but | ||
-- just added into the output targetlist of partial aggregation nodes. | ||
select * from unnest(array[0, 1, 2]::int[]) x, lateral (select sum(a) from pvagg group by x) xx; | ||
x | sum | ||
---+--------- | ||
0 | 1998000 | ||
1 | 1998000 | ||
2 | 1998000 | ||
(3 rows) | ||
|
||
drop table pvagg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a copyObject() here but not in the other
return
cases? Or, to ask it differently, should we docopyObject()
also in the other return of var?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no practical consequences here, but it is idiomatic for the expression tree mutators to return a copy. I added copyObject into the second place as well.