[Perf]: Planning time increases linearly with number of stripes #188

leoyvens · 2023-10-30T21:03:59Z

What's wrong?

I noticed this because I have two tables with the same data, but one was built with the default 150k rows per stripe and the other with 10k rows per stripe. The table has about 1 billion rows. Both tables have a BTree index on vid.

This is an explain analyze for the table with 150k rows per stripe:

explain analyze select s.amount_in from swap_columnar_vid s where vid in (899171129, 999171129);
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_swap_columnar_vid on swap_columnar_vid s  (cost=0.57..1945.61 rows=2 width=10) (actual time=21.079..21.093 rows=1 loops=1)
   Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
 Planning Time: 11.635 ms
 Execution Time: 21.899 ms
(4 rows)

And now for the table with 10k rows per stripe:

d30750a14969=> explain analyze select s.amount_in from swap_columnar_vid_stripe_10k s where vid in (899171129, 999171129);
                                                                                QUERY PLAN                                                                     
            
---------------------------------------------------------------------------------------------------------------------------------------------------------------
------------
 Index Scan using idx_vid_on_swap_columnar_vid_stripe_10k on swap_columnar_vid_stripe_10k s  (cost=0.57..142.02 rows=2 width=10) (actual time=6.422..6.435 rows
=1 loops=1)
   Index Cond: (vid = ANY ('{899171129,999171129}'::bigint[]))
 Planning Time: 142.906 ms
 Execution Time: 6.474 ms
(4 rows)

Notice how the query planning increased proportionally to the increase in number of stripes, and to an unreasonable 140ms. And the plan isn't even a columnar scan. This can be worked around to some extent by using prepared statements to take advantage of plan caching, but still this is a performance issue.

The text was updated successfully, but these errors were encountered:

mkaruza · 2023-11-02T14:17:44Z

Hi @leoyvens thank you for report.

Lowering stripe max row count will increase stripes for table and thus related metadata. Part of planning is to read table stripe metadata to calculate cost for reading columnar tables and there is where difference is.

Although it looks significant 11ms vs. 142ms, both times are still milliseconds.

We'll discuss if this is something that is on our priority to work on - so marking this as enhancement rather than bug.

leoyvens · 2023-11-07T18:15:29Z

Instead of reading the metadata table directly, which seems unusually expensive for a planning step, perhaps the stripe count and row count could be kept as statistics, that is, updated only on VACUUM or ANALYZE.

wuputah · 2023-11-07T18:27:26Z

So we can understand your use case better, what are you trying to accomplish by making the maximum stripe size smaller? Did doing so have the desired effect (aside from the increased planning time)?

leoyvens · 2023-11-07T21:25:21Z

I am looking to improve the performance of index scans. As you can also see in the OP, the smaller stripe size resulted in a much faster execution time. In my measurements I saw an average speed up of 5x in index scans when using a 10k stripe size instead of the default 150k.

Taking a step back, if I'm fetching a single column of a single row using a btree index, there is no fundamental reason why columnar should perform worse than a heap table. At default configurations, heap performs better because its page size is much smaller than the default columnar stripe size, but that's a tuneable parameter and not a fundamental limitation of columnar.

In fact, according to the benchmarks I ran, if this issue and the column overfetching issue discussed in #187 are addressed, columnar index scans at a 10k stripe size should become competitive with heap index scans.

leoyvens added the bug Something isn't working label Oct 30, 2023

wuputah assigned mkaruza Nov 1, 2023

mkaruza added enhancement New feature or request and removed bug Something isn't working labels Nov 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Perf]: Planning time increases linearly with number of stripes #188

[Perf]: Planning time increases linearly with number of stripes #188

leoyvens commented Oct 30, 2023

mkaruza commented Nov 2, 2023

leoyvens commented Nov 7, 2023 •

edited

Loading

wuputah commented Nov 7, 2023

leoyvens commented Nov 7, 2023

[Perf]: Planning time increases linearly with number of stripes #188

[Perf]: Planning time increases linearly with number of stripes #188

Comments

leoyvens commented Oct 30, 2023

What's wrong?

mkaruza commented Nov 2, 2023

leoyvens commented Nov 7, 2023 • edited Loading

wuputah commented Nov 7, 2023

leoyvens commented Nov 7, 2023

leoyvens commented Nov 7, 2023 •

edited

Loading