-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New section describing familial PEDIGREE headers #413
base: master
Are you sure you want to change the base?
Conversation
VCFv4.3.tex
Outdated
@@ -1380,6 +1380,19 @@ \subsection{Representing unspecified alleles and REF-only blocks (gVCF)} | |||
\end{flushleft} | |||
\normalsize | |||
|
|||
\subsection{Describing family relationships} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to make this part of the PEDIGREE meta entry description. I can also address the TODO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedigree lines are putatively used in two separate scenarios: clonal relationships and trios/families. There is already a separate section covering the clonal scenario, so it makes some sense to cover the other one in its own separate section too.
Similarly the minimal §1.4.8 META and SAMPLE meta entry description is expanded upon by §5.4.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue I see is that the clonal relationships are described in the context of breakends, previous knowledge about that syntax is needed and, as a result, they are subsections of that one. On the other hand, this would be a completely separate one which would perfectly fit in the PEDIGREE section.
@@ -253,7 +253,7 @@ \subsubsection{Pedigree field format} | |||
##pedigreeDB=URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the line I'm interested in but I can't comment on the other one. PED does not support more than 2 ancestors, do we want to do it in VCF? I have never seen this used so I don't think a lot of people will miss it if we drop it. That will make trivial to add some examples for trios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pd3 @lbergelson what should we do about this? Dropping support for more than 2 ancestors would render some files incorrect, but as I said in my previous comment I have never seen that syntax being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there would be a lot of pushback against dropping multiple ancestor lines because they seem pretty ambiguously defined at the moment and I haven't ever seen one used... That said, we should maybe not be making breaking changes to an existing spec?
Are they intended only for asexual ancestry where each is the parent of the next? Or are they just an unsorted bag of ancestors that could represent any tree of parentage?
i.e does <ID=SampleID,Name_1=Ancestor_1,...,Name_N=Ancestor_N>
imply
SampleID -> Ancestor_1 -> Ancestor 2
or could it also mean
Ancestor_1 <- SampleID -> Ancestor_2
I lean towards removing or deprecating it if we don't know exactly what it means and no one seems to be using it.
I'm also not clear on which of these are controlled vocabulary. Is there a specific ontology of relationships that are allowed? Are we allowed to specify something like Sibling
in the case where we don't have parent in the vcf or is that handled with dummy trios that point to unique but not present parents ID's
Are you allowed to include only 1 parent or are trios required?
I assume the example would address some of these questions.
If people are happy with the slight changes to the document structure, I will add an example about how do PEDIGREE entries compare against PED files. |
I added some commits to add a minimal PED to VCF example. If people think that a complete example is preferable than a minimal one, I can add some corner cases. Also, note that I referenced other sections instead of putting a complete VCF with the phenotypes and genotypes. I think it's better not to repeat details in different sections for the risk of getting outdated, but in this case I can be convinced otherwise. Finally, the third commit adds syntax to express sibling/twin relationships. The only thing I'm not sure about is the generic ancestor line |
Rebased now that the first commit (typographical errors in examples) has been split off as PR #583 and merged to master. |
Two separate commits, the first fixing obvious errors and the second an outline of a suggested new “Describing family relationships” section:
Add missing[Split off as a separate PR, Fix##
and other words; replicate 2016's PR Add missing genome IDs #176's fixes in the other copies of the spec.##PEDIGREE
example typographical errors #583.]Outline of a new “Describing family relationships” section that would address clarifications on vcf pedigree header #381 by explaining how the
##PEDIGREE
trio metadata line shown in the VCF spec is supposed to work and how it corresponds to an external PED file.Previous discussion has suggested that this functionality is thought to be stillborn. If so, the examples should be removed from the specs. Otherwise if the functionality is used out there, it would be good for the spec to describe the (fairly obvious) way in which it would be intended to be used.