Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert defn of CRAM container size to be sum of block sizes (PR#731) #731

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

jkbonfield
Copy link
Contributor

This was added as clarification in #398 after discussion in #396, but this was in error. In our attempts to clarify and nail down these corner cases, we failed to recall that the SAM header is permitted to be padded out by non-block allocated space.

History on this decision dates back to 2013 and is show in Samtools issue samtools/samtools#1852.

There are good reasons for changing away from the decision of padding via a second block, as changing block sizes can also change block structure size (if we're using a generic shared piece of code, due to ITF8 being a variable length integer), and this in turn makes it cumbersome to handle every possible change in SAM header size. It is far easier and simpler to just have unallocated space after the block and before the end of the container. This is how htslib works since CRAM 3.0 and I believe how CRAMtools.jar works.

Fixes samtools/samtools#1852.

@github-actions
Copy link

Changed PDFs as of 9963718: CRAMv3 (diff).

@jkbonfield
Copy link
Contributor Author

TODO: Does this apply only to header container, or all containers?

This was added as clarification in samtools#398 after discussion in samtools#396, but
this was in error.  In our attempts to clarify and nail down these
corner cases, we failed to recall that the SAM header is permitted to
be padded out by non-block allocated space.

History on this decision dates back to 2013 and is show in Samtools
issue samtools/samtools#1852.

There are good reasons for changing away from the decision of padding
via a second block, as changing block sizes can also change block
structure size (if we're using a generic shared piece of code, due to
ITF8 being a variable length integer), and this in turn makes it
cumbersome to handle every possible change in SAM header size.  It is
far easier and simpler to just have unallocated space after the block
and before the end of the container.  This is how htslib works since
CRAM 3.0 and I believe how CRAMtools.jar works.

Fixes samtools/samtools#1852.
@jkbonfield
Copy link
Contributor Author

Minimal update made to explicitly state the additional padding bytes are for the CRAM header container only. If we find a compelling reason to later we can always relax this limitation while keeping backwards compaibility.

@github-actions
Copy link

Changed PDFs as of 8e750cb: CRAMv3 (diff).

@jkbonfield jkbonfield merged commit ebebbc8 into samtools:master Aug 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

in-place reheadering doesn't follow CRAM standard?
1 participant