Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around colons in reference names (GRCh38 HLA). #708

Merged
merged 9 commits into from
May 31, 2019

Commits on May 30, 2019

  1. Work around colons in reference names (GRCh38 HLA).

    This extends the callback function used by the hts iterator code to
    the ref:start-end range parser, permitting the parser to validate
    whether the whole thing matches a reference name.
    
    This works around parsing conflicts when given range queries like
    "HLA-DRB1*12:17", which is either the whole of "HLA-DRB1*12:17" (this
    exists in GRCh38) or base 17 onwards of ""HLA-DRB1*12" (which does
    not).
    
    Note there are still some undecided questions here.  Do we want to
    handle the special names used in the iterator at this level to?  Eg
    "*" meaning unmapped data and "." meaning whole file?
    jkbonfield authored and daviesrob committed May 30, 2019
    Configuration menu
    Copy the full SHA
    5f6bbdf View commit details
    Browse the repository at this point in the history

Commits on May 31, 2019

  1. Now region parsing fails on ambiguous cases.

    Also improved return values so NULL is primary way of detecting
    failure rather than tid.  This is more in line with the old
    hts_parse_reg code.
    
    See samtools/hts-specs#124 (comment)
    for heuristic suggestions.
    jkbonfield authored and daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    1bb085a View commit details
    Browse the repository at this point in the history
  2. Added support for brace-quoting of reference names.

    Eg with contigs named "chr1" and "chr1:100-200" we can specify
    "{chr1}:100-200" and "{chr1:100-200}" to disambiguate.
    jkbonfield authored and daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    f1a6a3f View commit details
    Browse the repository at this point in the history
  3. Major rewrite of hts_parse_region.

    The old function (hts_parse_reg) has been put back, meaning the new
    API can now have the getid func pointer as a mandatory requirement.
    
    Added tests.
    
    Tweaked sam_parse_region to have the flags parameter.  This is still
    required as bcftools and samtools use different parameters (parsing
    style is tool based rather than file format based).
    jkbonfield authored and daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    d224a13 View commit details
    Browse the repository at this point in the history
  4. Changes faidx to use the standard region parser.

    Previously it had its own custom parser which nearly, but not quite,
    matched the old hts_parse_reg code in functionality.
    
    Note this changes the fasta coordinate type from long to int, which is
    a backwards step.  However it is expected this will subsequently
    change to hts_pos_t in another PR.
    jkbonfield authored and daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    878cec3 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0ce6702 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    0eac47d View commit details
    Browse the repository at this point in the history
  7. Expand hts_parse_region documentation in the header file.

    Mainly copied from the useful (but not so visible) comment
    before the function definition in hts.c.
    daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    73eee10 View commit details
    Browse the repository at this point in the history
  8. Update hts_reglist_create() to use hts_parse_region()

    Hash table can now use tid (cast to khash32_t, which is unsigned)
    as key instead of the region name, avoiding some string copying.
    daviesrob committed May 31, 2019
    Configuration menu
    Copy the full SHA
    d26300e View commit details
    Browse the repository at this point in the history