Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for merging chunks #7433

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erimatnor
Copy link
Contributor

@erimatnor erimatnor commented Nov 11, 2024

A new procedure called merge_chunks is introduced that can merge an arbitrary number of chunks if the right conditions apply. Basic checks are done to ensure that the chunks can be merged from a partitioning
perspective. Some more advanced cases that are potentially mergeable are not supported at this time (e.g., complicated merges of chunks with multi-dimensional partitioning).

Currently, the merge defaults to taking an AccessExclusive lock on the merged chunks to prevent deadlocks and concurrent modifications. Weaker locking is supported via an anonymous settings variable, but this is mostly to prove in tests that these approaches can lead to deadlocks.

The actual merging is done by rewriting all the data from multiple chunks into a (temporary) merged heap using the same approach as that implemented to support VACUUM FULL and CLUSTER. Then this new heap is swapped into one of the original relations while the rest are dropped. This approach is MVCC compliant and implements correct visibility under higher isolation levels, while also doing vacuum and leaving no garbage tuples.

@erimatnor erimatnor force-pushed the merge-chunks branch 5 times, most recently from ce949f8 to 581388c Compare November 12, 2024 09:42
@@ -44,6 +44,10 @@ CREATE OR REPLACE FUNCTION @[email protected]_chunk(
if_compressed BOOLEAN = true
) RETURNS REGCLASS AS '@MODULE_PATHNAME@', 'ts_decompress_chunk' LANGUAGE C STRICT VOLATILE;

CREATE OR REPLACE PROCEDURE @[email protected]_chunks(
Copy link
Contributor Author

@erimatnor erimatnor Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the function in the extension schema instead of _timescaledb_functions since it is a function in line with compress_chunk, etc. But we could move it to _timescaledb_functions if we want this to be less of a public feature. 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on the same level as compress_chunk, drop_chunk, and friends so we should keep it at the same place as these.

A new procedure called `merge_chunks` is introduced that can merge an
arbitrary number of chunks if the right conditions apply. Basic checks
are done to ensure that the chunks can be merged from a partitioning
perspective. Some more advanced cases that are potentially mergeable
are not supported at this time (e.g., complicated merges of chunks
with multi-dimensional partitioning).

Currently, the merge defaults to taking an AccessExclusive lock on the
merged chunks to prevent deadlocks and concurrent
modifications. Weaker locking is supported via an anonymous settings
variable, but this is mostly to prove in tests that these approaches
can lead to deadlocks.

The actual merging is done by rewriting all the data from multiple
chunks into a (temporary) merged heap using the same approach as that
implemented to support VACUUM FULL and CLUSTER. Then this new heap is
swapped into one of the original relations while the rest are
dropped. This approach is MVCC compliant and implements correct
visibility under higher isolation levels, while also doing vacuum and
leaving no garbage tuples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants