You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a VCF that was processed with HiPhase. The file contains lines with "|" as well as "/" for the GT value. Example:
chrX 1189233 . T A 7.4 PASS . GT:GQ:DP:AD:VAF:PL:PS 0|1:7:4:1,3:0.75:6,0,19:1189207
chrX 1189258 . G A 30.7 PASS . GT:GQ:DP:AD:VAF:PL 1/1:13:4:0,4:1:30,12,0
nft-vcf incorrectly assumes that the file is not phased. Looking into the code I realize that a single slash in any line is sufficient to set phased to false (it's true by default and gets updated when reading the file). I rather think it should be the other way around: A single pipe ("|") should be sufficient to assume the VCF is phased (and set phased to false by default) - assuming the code stays roughly as is.
Further thoughts: The VCF 4.2 spec says that a PS field should be added for phased variants. Maybe one could check for this field instead of looping over the variants. The current approach assumes that a single unphased genotype means the whole VCF is unphased. That seems too strict (or even wrong). If the phasing information is something that is expected to change on a per-variant base, you might rather want to have a phased boolean array with a value for each variant.
The text was updated successfully, but these errors were encountered:
Thank you for bringing this up! In our primary use case (genotype imputation from microarray data), a single pipe isn’t sufficient to ensure the file is correctly phased. However, I agree that supporting a single-pipe configuration could be useful for other workflows.
If you’re interested, we’d love to see a Pull Request (PR) from you to implement this feature. My suggestion would be to add a new parameter to the load function to handle an alternative strategy.
Hi,
I have a VCF that was processed with HiPhase. The file contains lines with "|" as well as "/" for the
GT
value. Example:nft-vcf incorrectly assumes that the file is not phased. Looking into the code I realize that a single slash in any line is sufficient to set
phased
tofalse
(it'strue
by default and gets updated when reading the file). I rather think it should be the other way around: A single pipe ("|") should be sufficient to assume the VCF is phased (and setphased
tofalse
by default) - assuming the code stays roughly as is.Further thoughts: The VCF 4.2 spec says that a
PS
field should be added for phased variants. Maybe one could check for this field instead of looping over the variants. The current approach assumes that a single unphased genotype means the whole VCF is unphased. That seems too strict (or even wrong). If the phasing information is something that is expected to change on a per-variant base, you might rather want to have aphased
boolean array with a value for each variant.The text was updated successfully, but these errors were encountered: