-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order chunks for compression by range_start #7148
base: main
Are you sure you want to change the base?
Conversation
c7a5c90
to
13b9416
Compare
Previously, when compressing chunks the order was non-deterministic; it was usaully ordered by chunk id, but not guaranteed. And while typically chunk id corresponds with range_start, it is not always so: backfilling chunks can lead to larger chunk ids for older chunks. Generally, it seems preferable to compress chunks oldest to newest. When using the experimental compress_chunk_time_interval option, it is even more so preferable because it allows for the most efficient roll ups. If not, the chunks compressed out of order can "cut off" a rolled up chunk before it was completely full.
13b9416
to
cea1cb5
Compare
Did you test this with hypertables with multiple dimensions? |
I did not, but I'm realizing I want to probably just do it by the primary dimension |
Is this going to cause even more problems for backilling? Meaning, ordering by range_start will probably make the backfilled chunks be compressed sooner, potentially even while they are being backfilled. This could happen even in current setup but now its guaranteed to be picked up sooner rather than later. Thoughts on doing this ordering only if you are using compress_created_before to avoid said issues? Other than that, I'm OK with this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good change, certainly better than no particular order. Wondering if it would be good to also add an option to do other ordering as a configuration option.
You might also want to consider ordering by all dimensions, to have more determinism with multi-dimensional partitioning.
Previously, when compressing chunks the order was non-deterministic; it was usaully ordered by chunk id, but not guaranteed. And while typically chunk id corresponds with range_start, it is not always so: backfilling chunks can lead to larger chunk ids for older chunks.
Generally, it seems preferable to compress chunks oldest to newest. When using the experimental compress_chunk_time_interval option, it is even more so preferable because it allows for the most efficient roll ups. If not, the chunks compressed out of order can "cut off" a rolled up chunk before it was completely full.