Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous VCF formatting and typo fixes #391

Merged
merged 8 commits into from
Mar 8, 2019

Conversation

jmarshall
Copy link
Member

Fix the BCF encoding errors noted in #382, and various formatting improvements.

No changes to the format or descriptions, so this should be straightforward to review and merge.

The non-TeX quotes etc in §1.4.7 Contig field format are left untouched so as not to conflict with lines changed in PR #379.

Fixes samtools#382. Fix formatting of ID and REF+ALT as separate table rows.
In particular, use a string that is not subject to change!
Hat tip Thomas Colthurst. Closes samtools#363.

Also fix a|b formatting and note that the inline/overflow boundary
is >= 15 rather than > 15.
Also avoid using \textbackslash within \texttt{}.
Move the example \begin{enumerate} up to the otherwise blank half page
before the landscape example SV VCF.

Use "ht" in \begin{figure}[ht] where necessary, and keep some of the
associated paragraphs together.

Use \makecell to avoid an underful hbox.
Also make "dbsnp" and "138" verbatim, as the double quotes are part
of what is being shown.
Compress the list of columns and encourage a pagebreak in "4 REF" to get
more vertical space for Table 1 on page 8. Use \longtable so Table 1 can
be broken across pages; don't use \begin{table}, so that the tables are
positioned within their related ("8 INFO" / "Genotype fields") text.
Keep initial "INFO keys used for structural variants" paragraphs together
so they don't creep back to page 11.

(Previously it was very easy to misread the INFO keys table as pertaining
to the genotype text surrounding it.)
@jmarshall jmarshall added the vcf label Mar 7, 2019
@jmarshall jmarshall requested a review from cyenyxe March 7, 2019 17:12
The bar is only for genotype fields. Fixes samtools#390.

Also for the genotype fields, use plain | rather than \mid, which adds
excess space around the delimiter.
Copy link
Member

@cyenyxe cyenyxe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to header line and all the tables look great, thanks! Just one comment on my side.

Multiple bases are permitted.
The value in the POS field refers to the position of the first base in the String.
For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; this padding base is not required (although it is permitted) for e.g. complex substitutions or other events where all alleles have at least one base represented in their Strings.
For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; \pagebreak[1] this padding base is not required (although it is permitted) for e.g.\ complex substitutions or other events where all alleles have at least one base represented in their Strings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a pretty plain paragraph, I'm not sure the \pagebreak[1] command is very useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the commit message: encourage a pagebreak in "4 REF" to get more vertical space for Table 1 on page 8. Without this, there is wasted space on the previous page and Table 1 is split more awkwardly.

(Also just pushed one more fix.)

@cyenyxe cyenyxe merged commit a86f704 into samtools:master Mar 8, 2019
@jmarshall
Copy link
Member Author

Thanks. I would have merged that with a merge commit, as it's a sequence of distinct commits that tell a story. Squashing them into a single commit makes it a bit harder for future us to look back and understand the history, but YMMV.

@jmarshall jmarshall deleted the vcf-typos branch March 8, 2019 13:59
@cyenyxe
Copy link
Member

cyenyxe commented Mar 11, 2019

I actually prefer merge commits, but I thought the standard policy was always squashing...

@jmarshall
Copy link
Member Author

I don't think we have a policy as such.

What often happens in this repository is that a single-commit PR gets improved and workshopped and acquires a bunch of “Fixed typos” and “Respond to review comments“ commits. Clearly the best practice in that case is to squash, as the fixup commits are not of later historical interest.

Multi-commit PRs containing distinct fixes are a different case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants