-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hts_itr_query fails on empty BCFs with gapped tid blocks #1534
Comments
The difference is due to the index on the file with no data records claiming that there is only one reference, while the one with data records says there are two. In the no-data case, trying to look up the index entry causes The initial count of the number of references is made in I'm not sure if this counts as an indexing error, or if As a side note: the |
Instead return one that instantly finishes. Fixes an edge case (issue samtools#1534) where an index did not include an entry for a chromosome that was mentioned in the file header but had no data records. Normally these would be present but empty, but it was possible to use the IDX= key in a VCF file to make an index where the chromosome simply did not appear. In this case, rather than an error, we want to return the equivalent of HTS_IDX_NONE so the iterator produces no data. Another scenario where this is useful is if you build an index, and then try to use it immediately without first saving and reading it back in again. Such an index will have NULL entries in bidx[] for any chromosomes with no data. Again we want to return an HTS_IDX_NONE iterator if one of those chromosomes is queried. (This issue didn't usually occur because most programs are loading in an existing index, and idx_read_core() makes bidx[] entries for everything even if there's nothing in the index for the chromosome.) Note that this changes vcf_loop() in test_view.c so that it now treats bcf_itr_querys() failures as an error. The new behaviour matches sam_loop() and is needed to detect the problem being fixed here. All the other tests still work after this change no nothing was relying on the old behaviour of ignoring the errors.
Instead return one that instantly finishes. Fixes an edge case (issue samtools#1534) where an index did not include an entry for a chromosome that was mentioned in the file header but had no data records. Normally these would be present but empty, but it was possible to use the IDX= key in a VCF file to make an index where the chromosome simply did not appear. In this case, rather than an error, we want to return the equivalent of HTS_IDX_NONE so the iterator produces no data. Another scenario where this is useful is if you build an index, and then try to use it immediately without first saving and reading it back in again. Such an index will have NULL entries in bidx[] for any chromosomes with no data. Again we want to return an HTS_IDX_NONE iterator if one of those chromosomes is queried. (This issue didn't usually occur because most programs are loading in an existing index, and idx_read_core() makes bidx[] entries for everything even if there's nothing in the index for the chromosome.) Note that this changes vcf_loop() in test_view.c so that it now treats bcf_itr_querys() failures as an error. The new behaviour matches sam_loop() and is needed to detect the problem being fixed here. All the other tests still work after this change no nothing was relying on the old behaviour of ignoring the errors.
Instead return one that instantly finishes. Fixes an edge case (issue #1534) where an index did not include an entry for a chromosome that was mentioned in the file header but had no data records. Normally these would be present but empty, but it was possible to use the IDX= key in a VCF file to make an index where the chromosome simply did not appear. In this case, rather than an error, we want to return the equivalent of HTS_IDX_NONE so the iterator produces no data. Another scenario where this is useful is if you build an index, and then try to use it immediately without first saving and reading it back in again. Such an index will have NULL entries in bidx[] for any chromosomes with no data. Again we want to return an HTS_IDX_NONE iterator if one of those chromosomes is queried. (This issue didn't usually occur because most programs are loading in an existing index, and idx_read_core() makes bidx[] entries for everything even if there's nothing in the index for the chromosome.) Note that this changes vcf_loop() in test_view.c so that it now treats bcf_itr_querys() failures as an error. The new behaviour matches sam_loop() and is needed to detect the problem being fixed here. All the other tests still work after this change no nothing was relying on the old behaviour of ignoring the errors.
This is related to #1533 and concerns BCFs with edited headers and missing data records. Consider this example
Note the header line contains the field
IDX=1
which makes it behave as if BCF was edited and the first chromosome withIDX=0
was removed.Prior to the commit d64e710 this command fails with
with the fix applied, it works
However, if there are no data records,
hts_itr_query
returns an error and the program failsThis problem does not appear when the chromosome tid block has no gaps, i.e. starts with IDX=0
The text was updated successfully, but these errors were encountered: