From 0e7b06aef0321afb5f903e8a70ed59910a31586e Mon Sep 17 00:00:00 2001 From: John Marshall Date: Wed, 10 Apr 2019 14:26:03 +0100 Subject: [PATCH 1/5] [DRAFT] Outline of new familial ##PEDIGREE section --- VCFv4.3.tex | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/VCFv4.3.tex b/VCFv4.3.tex index 58b64533c..d8c8c1bf1 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -262,7 +262,7 @@ \subsubsection{Pedigree field format} ##pedigreeDB=URL \end{verbatim} -\noindent See \ref{PedigreeInDetail} for details. +\noindent See \ref{ClonalPedigree} (clonal relationships) and \ref{FamilialPedigree} (trios and families) for details. \subsection{Header line syntax} @@ -1261,7 +1261,7 @@ \subsubsection{Sample mixtures} \normalsize \subsubsection{Clonal derivation relationships} -\label{PedigreeInDetail} +\label{ClonalPedigree} In cancer, each VCF file represents several genomes from a patient, but one genome is special in that it represents the germline genome of the patient. This genome is contrasted to a second genome, the cancer tumor genome. In the simplest case the VCF file for a single patient contains only these two genomes. @@ -1281,7 +1281,7 @@ \subsubsection{Clonal derivation relationships} \end{verbatim} This line asserts that the DNA in genome DerivedID is asexually or clonally derived with mutations from the DNA in genome OriginalID. -This is the asexual analog of the VCF format that has been proposed for family relationships between genomes, i.e., there is one entry per trio of the form: +This is the asexual analog of the VCF format that has been proposed for family relationships between genomes (see~\ref{FamilialPedigree}), in which there is one entry per trio of the form: \begin{verbatim} ##PEDIGREE= @@ -1404,6 +1404,19 @@ \subsection{Representing unspecified alleles and REF-only blocks (gVCF)} \end{flushleft} \normalsize +\subsection{Describing family relationships} +\label{FamilialPedigree} + +This is the VCF format that has been proposed for family relationships between genomes, in which there is one entry per trio of the form: + +\begin{verbatim} +##PEDIGREE= +\end{verbatim} + +TODO: Text describing meaning of Mother and Father well-known ancestor tags +and explaining how to represent a PED file as in-line \#\#PEDIGREE trios. +Noting alternatively to use \#\#pedigreeDB=URL to point to an external PED file. + \pagebreak \section{BCF specification} From 2fcd817f6a8f588f1aec96a24b2536710c33fe08 Mon Sep 17 00:00:00 2001 From: Cristina Yenyxe Gonzalez Garcia Date: Wed, 24 Jul 2019 14:44:41 +0100 Subject: [PATCH 2/5] Restructured where is PEDIGREE meta explained --- VCFv4.3.tex | 33 +++++++++++++-------------------- 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/VCFv4.3.tex b/VCFv4.3.tex index d8c8c1bf1..7970529ea 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -257,12 +257,23 @@ \subsubsection{Pedigree field format} ##PEDIGREE= ##PEDIGREE= \end{verbatim} -\noindent or a link to a database: + +The first two lines assert that the DNA in genomes TumourSample and SomaticNonTumour is asexually or clonally derived with mutations from the DNA in genome OriginalID. +The third line describes a family relationship between genomes. +A VCF will therefore contain one entry per trio. +The fourth line is an example of the most general form of a pedigree line. +It means that the genome SampleID is derived from the N $\ge$ 1 genomes Ancestor1, ..., AncestorN. + +TODO: Text describing meaning of Mother and Father well-known ancestor tags +and explaining how to represent a PED file as in-line \#\#PEDIGREE trios. +Noting alternatively to use \#\#pedigreeDB=URL to point to an external PED file. + +\noindent If samples and the relationships between them are described in an external resource such as a database or PED file, is it also possible to provide a link: \begin{verbatim} ##pedigreeDB=URL \end{verbatim} -\noindent See \ref{ClonalPedigree} (clonal relationships) and \ref{FamilialPedigree} (trios and families) for details. +\noindent See \ref{ClonalPedigree} for more details about clonal relationships. \subsection{Header line syntax} @@ -1281,11 +1292,6 @@ \subsubsection{Clonal derivation relationships} \end{verbatim} This line asserts that the DNA in genome DerivedID is asexually or clonally derived with mutations from the DNA in genome OriginalID. -This is the asexual analog of the VCF format that has been proposed for family relationships between genomes (see~\ref{FamilialPedigree}), in which there is one entry per trio of the form: - -\begin{verbatim} -##PEDIGREE= -\end{verbatim} Let's consider a cancer patient VCF file with 4 genomes: germline, primary\_tumor, secondary\_tumor1, and secondary\_tumor2 as illustrated in Figure 10. The primary\_tumor is derived from the germline and the secondary tumors are each derived independently from the primary tumor, in all cases by clonal derivation with mutations. @@ -1404,19 +1410,6 @@ \subsection{Representing unspecified alleles and REF-only blocks (gVCF)} \end{flushleft} \normalsize -\subsection{Describing family relationships} -\label{FamilialPedigree} - -This is the VCF format that has been proposed for family relationships between genomes, in which there is one entry per trio of the form: - -\begin{verbatim} -##PEDIGREE= -\end{verbatim} - -TODO: Text describing meaning of Mother and Father well-known ancestor tags -and explaining how to represent a PED file as in-line \#\#PEDIGREE trios. -Noting alternatively to use \#\#pedigreeDB=URL to point to an external PED file. - \pagebreak \section{BCF specification} From a35d422414c2ada636bc994e9d92589b3225a85e Mon Sep 17 00:00:00 2001 From: jose miguel mut Date: Thu, 14 Nov 2019 15:26:17 +0000 Subject: [PATCH 3/5] Add example of PED to VCF conversion --- VCFv4.3.tex | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/VCFv4.3.tex b/VCFv4.3.tex index 7970529ea..97d91ab4a 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -264,11 +264,18 @@ \subsubsection{Pedigree field format} The fourth line is an example of the most general form of a pedigree line. It means that the genome SampleID is derived from the N $\ge$ 1 genomes Ancestor1, ..., AncestorN. -TODO: Text describing meaning of Mother and Father well-known ancestor tags -and explaining how to represent a PED file as in-line \#\#PEDIGREE trios. -Noting alternatively to use \#\#pedigreeDB=URL to point to an external PED file. +Mother and Father have the same meaning as in PED files. Consider the following example PED line (the columns are Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Phenotype, Genotypes): +\begin{verbatim} +FAM001 9 7 8 1 2 A A +\end{verbatim} + +The family described in that line can be expressed in VCF as: + +\begin{verbatim} +##PEDIGREE= +\end{verbatim} -\noindent If samples and the relationships between them are described in an external resource such as a database or PED file, is it also possible to provide a link: +If samples and the relationships between them are described in an external resource such as a database or PED file, is it also possible to provide a link: \begin{verbatim} ##pedigreeDB=URL \end{verbatim} From 36b3918669e9a158b6e4dc957cea5906a82ea9e7 Mon Sep 17 00:00:00 2001 From: jose miguel mut Date: Sun, 1 Dec 2019 23:49:40 +0000 Subject: [PATCH 4/5] VCF: add references in pedigree example --- VCFv4.3.tex | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/VCFv4.3.tex b/VCFv4.3.tex index 97d91ab4a..d2f7fd0f7 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -238,6 +238,7 @@ \subsubsection{Contig field format} \subsubsection{Sample field format} +\label{meta-sample} It is possible to define sample to genome mappings as shown below: {\scriptsize \begin{verbatim} @@ -275,12 +276,14 @@ \subsubsection{Pedigree field format} ##PEDIGREE= \end{verbatim} +Phenotypes can be expressed as explained in Section \ref{meta-sample}, and genotypes as in Section \ref{genotype-fields}. + If samples and the relationships between them are described in an external resource such as a database or PED file, is it also possible to provide a link: \begin{verbatim} ##pedigreeDB=URL \end{verbatim} -\noindent See \ref{ClonalPedigree} for more details about clonal relationships. +\noindent See Section \ref{ClonalPedigree} for more details about clonal relationships. \subsection{Header line syntax} @@ -408,6 +411,7 @@ \subsubsection{Fixed fields} \end{itemize} \subsubsection{Genotype fields} +\label{genotype-fields} If genotype information is present, then the same types of data must be present for all samples. First a FORMAT field is given specifying the data types and order (colon-separated FORMAT keys matching the regular expression \texttt{\^{}[A-Za-z\_][0-9A-Za-z\_.]*\$}, duplicate keys are not allowed). This is followed by one data block per sample, with the colon-separated data corresponding to the types specified in the format. From f1bc9828c63eabb38bcb9d16549ee200303ba184 Mon Sep 17 00:00:00 2001 From: jose miguel mut Date: Mon, 2 Dec 2019 00:44:03 +0000 Subject: [PATCH 5/5] VCF: add siblings to pedigree section --- VCFv4.3.tex | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/VCFv4.3.tex b/VCFv4.3.tex index d2f7fd0f7..2a00f8cd1 100644 --- a/VCFv4.3.tex +++ b/VCFv4.3.tex @@ -257,6 +257,9 @@ \subsubsection{Pedigree field format} ##PEDIGREE= ##PEDIGREE= ##PEDIGREE= +##PEDIGREE= +##PEDIGREE= +##PEDIGREE= \end{verbatim} The first two lines assert that the DNA in genomes TumourSample and SomaticNonTumour is asexually or clonally derived with mutations from the DNA in genome OriginalID. @@ -264,8 +267,10 @@ \subsubsection{Pedigree field format} A VCF will therefore contain one entry per trio. The fourth line is an example of the most general form of a pedigree line. It means that the genome SampleID is derived from the N $\ge$ 1 genomes Ancestor1, ..., AncestorN. +The fifth and sixth lines describe relationships between twins. +Regular siblings can be inferred implicitly from trios like the third line, but if the parents are unknown, the seventh line describes a sibling relationship explicitly. -Mother and Father have the same meaning as in PED files. Consider the following example PED line (the columns are Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Phenotype, Genotypes): +Mother and Father are optional (e.g.\ if unknown) and have the same meaning as in PED files. Consider the following example PED line (the columns are Family ID, Individual ID, Paternal ID, Maternal ID, Sex, Phenotype, Genotypes): \begin{verbatim} FAM001 9 7 8 1 2 A A \end{verbatim}