-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: skip output level sstables with no data overlap with input level #389
Comments
Is the first step to add some stats where we do this seeking and record how many sstables and bytes could have been eliminated? Is it reasonable to add these stats for every compaction, or do you expect this to be slow enough that one should sample? |
Yes, I think that is the first step. I think the stats could be added for every compaction. The seeks should be fast enough that they will be lost in the noise of the compaction itself. |
There will probably be dragons in the handling of range tombstones that overlap the skipped sstable. |
We have marked this issue as stale because it has been inactive for |
We have marked this issue as stale because it has been inactive for |
See facebook/rocksdb#6021.
The idea is that if we're doing a compaction from Ln->Ln+1, we can skip sstables in Ln+1 which do contain any keys in Ln. Right now this determination is done solely based on the sstable boundaries, but it is only a little more work to create a
levelIter
on the input sstables in Ln and seek to the boundary keys in Ln+1 to see if any actual key is contained within the sstables in Ln+1.Here's an example of where this could be beneficial:
L0 has one sstable with the bounds
[a,z]
containing the keysa
,g
,t
, andz
. A compaction from L0 to L1 would normally involve every sstable in L1 because the bounds[a,z]
covers every sstable in L1. But notice that the L1 sstable[j,q]
doesn't overlap with any keys in L0.After selecting the initial set of input sstables in L0 and L1, we could open a
levelIter
on the sstables in L0 and seek that iterator to the boundary keys of the tables in L1, trimming the L1 inputs of any sstables that do not contain an overlapping key in L0. The compaction logic would have to be changed to ensure that the tables output to L1 are cut so as not to overlap with the excluded sstables in L1.Cc @sumeerbhola as this overlaps with other thinking we've been doing in this area. I'm not sure how often this situation occurs in practice. We'd want to do some experimentation to determine if it is common or rare.
Jira issue: PEBBLE-180
Epic CRDB-40361
The text was updated successfully, but these errors were encountered: